Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30076
Arabic Light Stemmer for Better Search Accuracy

Authors: Sahar Khedr, Dina Sayed, Ayman Hanafy

Abstract:

Arabic is one of the most ancient and critical languages in the world. It has over than 250 million Arabic native speakers and more than twenty countries having Arabic as one of its official languages. In the past decade, we have witnessed a rapid evolution in smart devices, social network and technology sector which led to the need to provide tools and libraries that properly tackle the Arabic language in different domains. Stemming is one of the most crucial linguistic fundamentals. It is used in many applications especially in information extraction and text mining fields. The motivation behind this work is to enhance the Arabic light stemmer to serve the data mining industry and leverage it in an open source community. The presented implementation works on enhancing the Arabic light stemmer by utilizing and enhancing an algorithm that provides an extension for a new set of rules and patterns accompanied by adjusted procedure. This study has proven a significant enhancement for better search accuracy with an average 10% improvement in comparison with previous works.

Keywords: Arabic data mining, Arabic Information extraction, Arabic Light stemmer, Arabic stemmer.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1127218

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1003

References:


[1] Tinsley, Teresa, and Kathryn Board. "Languages for the future: Which languages the UK needs most and why." British Council (2013).
[2] http://www.internetworldstats.com/stats7.htm.(accessed August 1, 2016)
[3] http://fortune.com/2015/07/14/ibm-watson-home-middle-east/ (accessed August 1, 2016)
[4] Larkey, Leah S., Lisa Ballesteros, and Margaret E. Connell. "Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis." Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM,
[5] El-Beltagy, Samhaa R., and Ahmed Rafea. "An accuracy-enhanced light stemmer for arabic text." ACM Transactions on Speech and Language Processing (TSLP) 7.2 (2011): 2.
[6] Ding, Wei, and Gary Marchionini. "A study on video browsing strategies." (1998).
[7] Porter, Martin F. "An algorithm for suffix stripping." Program 14.3 (1980): 130-137.
[8] Larkey, Leah S., Lisa Ballesteros, and Margaret E. Connell. "Light stemming for Arabic information retrieval." Arabic computational morphology. Springer Netherlands, 2007. 221-243.
[9] El-Sadany, Tarek A., and Mohamed A. Hashish. "An Arabic morphological system." IBM Systems Journal 28.4 (1989): 600-612.
[10] Mustafa, Suleiman H. "Word stemming for Arabic information retrieval: The case for simple light stemming." Abhath Al-Yarmouk: Science & Engineering Series 21.1 (2012): 2012.
[11] Kadri, Youssef, and Jian-Yun Nie. "Effective stemming for Arabic information retrieval." The Challenge of Arabic for NLP/MT, Intl Conf. at the BCS. 2006.
[12] Al-Omari, Asma, and Belal Abuata. "Arabic light stemmer (ARS)." Journal of Engineering Science and Technology 9.6 (2014): 702-717.
[13] Khoja, Shereen, and Roger Garside. "Stemming arabic text." Lancaster, UK, Computing Department, Lancaster University (1999).
[14] Harmanani, Haidar M., Walid Keirouz, and Saeed Raheel. "A Rule-Based Extensible Stemmer for Information Retrieval with Application to Arabic." Int. Arab J. Inf. Technol. 3.3 (2006): 265-272.
[15] Al-Shammari, Eiman Tamah, and Jessica Lin. "Towards an error-free Arabic stemming." Proceedings of the 2nd ACM workshop on Improving non english web searching. ACM, 2008.
[16] El-Defrawy, Mahmoud, Yasser El-Sonbaty, and Nahla A. Belal. "Enhancing Root Extractors Using Light Stemmers." (2015).
[17] Dahab, Mohamed Y., Al Ibrahim, and Rihab Al-Mutawa. "A Comparative Study on Arabic Stemmers." International Journal of Computer Applications 125.8 (2015).
[18] http://quest.ms.mff.cuni.cz/cgi-bin/elixir/index.fcgi.(accessed May 15, 2016)
[19] Ababneh, Mohamad, et al.. "Building an Effective Rule-Based Light Stemmer for Arabic Language to Improve Search Effectiveness." International Arab Journal of Information Technology (IAJIT) 9.4 (2012).
[20] http://arabic-media.com/egypt-news.htm. (accessed July 12, 2016)
[21] https://sourceforge.net/projects/arabiccorpus/files/watan-2004corpus/. (accessed July 17, 2016)
[22] https://invokeit.wordpress.com/FREQUENCY-WORD-LISTS/ (accessed July 20, 2016)