A Scalable Media Job Framework for an Open Source Search Engine
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
A Scalable Media Job Framework for an Open Source Search Engine

Authors: Pooja Mishra, Chris Pollett

Abstract:

This paper explores efficient ways to implement various media-updating features like news aggregation, video conversion, and bulk email handling. All of these jobs share the property that they are periodic in nature, and they all benefit from being handled in a distributed fashion. The data for these jobs also often comes from a social or collaborative source. We isolate the class of periodic, one round map reduce jobs as a useful setting to describe and handle media updating tasks. As such tasks are simpler than general map reduce jobs, programming them in a general map reduce platform could easily become tedious. This paper presents a MediaUpdater module of the Yioop Open Source Search Engine Web Portal designed to handle such jobs via an extension of a PHP class. We describe how to implement various media-updating tasks in our system as well as experiments carried out using these implementations on an Amazon Web Services cluster.

Keywords: Distributed jobs framework, news aggregation, video conversion, email.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1124633

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 973

References:


[1] S.Baluja, R. Seth, D. Sivakumar, Y. Jing, J.Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph. Proceeding of WWW 2008.
[2] Bash Reduce GitHub Page. Retrieved on Sep. 11, 2015 from https://github.com/erikfrey/bashreduce.
[3] Krishna Bharat. And now, News. The Official Google Blog. Jan. 23, 2006.
[4] FFmpeg. Retrieved Dec 4., 2015 from http://ffmpeg.org/.
[5] W.Lam, L.Liu, S.Prasad, A.Rajaraman, Z.Vacheri, and A.Doan. Muppet: Mapreduce-style processing of fast data. Proceedings of the VLDB Endowment (PVLDB), 5:18141825, 2012.
[6] Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. S4: Distributed Stream Computing Platform. In Data Mining Workshops, International Conference. IEEE Computer Society. pp 170–177. 2010.
[7] P. O’Connell. New Economy; Yahoo charts the spread of the news by e-mail, and what it finds out is itself becoming news. New York Times. Jan. 29, 2001. http://www.nytimes.com/2001/01/29/business/ new-economy-yahoo-charts-spread-e-mail-what-it-findsitself- becoming.html
[8] Oozie 4.2.0 Documentation. Retrieved on Sep. 11, 2015, from, http://oozie.apache.org/docs/4.2.0.
[9] Yioop Documentation from Seekquarry. Retrieved on Sep. 11, 2015 from http://www.seekquarry.com/p/Documentation.
[10] A. Silberstein, J. Terrace , B. F. Cooper , R. Ramakrishnan. Feeding Frenzy: Selectively Materializing Users Event Feeds . In SIGMOD 2010.
[11] Yahoo! Headline. Nov. 28, 1996. Internet Archive. https://web.archive.org/web/19961128074525/http://www8.yahoo.com/ headlines/