{"title":"A Scalable Media Job Framework for an Open Source Search Engine","authors":"Pooja Mishra, Chris Pollett","volume":113,"journal":"International Journal of Computer and Information Engineering","pagesStart":876,"pagesEnd":884,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/10004595","abstract":"This paper explores efficient ways to implement various
\r\nmedia-updating features like news aggregation, video conversion,
\r\nand bulk email handling. All of these jobs share the property
\r\nthat they are periodic in nature, and they all benefit from being
\r\nhandled in a distributed fashion. The data for these jobs also often
\r\ncomes from a social or collaborative source. We isolate the class of
\r\nperiodic, one round map reduce jobs as a useful setting to describe
\r\nand handle media updating tasks. As such tasks are simpler than
\r\ngeneral map reduce jobs, programming them in a general map
\r\nreduce platform could easily become tedious. This paper presents
\r\na MediaUpdater module of the Yioop Open Source Search Engine
\r\nWeb Portal designed to handle such jobs via an extension of a
\r\nPHP class. We describe how to implement various media-updating
\r\ntasks in our system as well as experiments carried out using these
\r\nimplementations on an Amazon Web Services cluster.","references":"[1] S.Baluja, R. Seth, D. Sivakumar, Y. Jing, J.Yagnik, S. Kumar, D.\r\nRavichandran, and M. Aly. Video Suggestion and Discovery for YouTube:\r\nTaking Random Walks Through the View Graph. Proceeding of WWW\r\n2008.\r\n[2] Bash Reduce GitHub Page. Retrieved on Sep. 11, 2015 from\r\nhttps:\/\/github.com\/erikfrey\/bashreduce.\r\n[3] Krishna Bharat. And now, News. The Official Google Blog. Jan. 23,\r\n2006.\r\n[4] FFmpeg. Retrieved Dec 4., 2015 from\r\nhttp:\/\/ffmpeg.org\/.\r\n[5] W.Lam, L.Liu, S.Prasad, A.Rajaraman, Z.Vacheri, and A.Doan. Muppet:\r\nMapreduce-style processing of fast data. Proceedings of the VLDB\r\nEndowment (PVLDB), 5:18141825, 2012.\r\n[6] Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. S4:\r\nDistributed Stream Computing Platform. In Data Mining Workshops,\r\nInternational Conference. IEEE Computer Society. pp 170\u2013177. 2010.\r\n[7] P. O\u2019Connell. New Economy; Yahoo charts the spread of the news by\r\ne-mail, and what it finds out is itself becoming news. New York Times.\r\nJan. 29, 2001. http:\/\/www.nytimes.com\/2001\/01\/29\/business\/\r\nnew-economy-yahoo-charts-spread-e-mail-what-it-findsitself-\r\nbecoming.html\r\n[8] Oozie 4.2.0 Documentation. Retrieved on Sep. 11, 2015, from,\r\nhttp:\/\/oozie.apache.org\/docs\/4.2.0.\r\n[9] Yioop Documentation from Seekquarry. Retrieved on Sep. 11, 2015 from\r\nhttp:\/\/www.seekquarry.com\/p\/Documentation.\r\n[10] A. Silberstein, J. Terrace , B. F. Cooper , R. Ramakrishnan. Feeding\r\nFrenzy: Selectively Materializing Users Event Feeds . In SIGMOD 2010.\r\n[11] Yahoo! Headline. Nov. 28, 1996. Internet Archive.\r\nhttps:\/\/web.archive.org\/web\/19961128074525\/http:\/\/www8.yahoo.com\/\r\nheadlines\/","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 113, 2016"}