Saudi Twitter Corpus for Sentiment Analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33087
Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, Sentiment Analysis, Twitter, annotation.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1338816

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4043

References:


[1] M.T. Diab, L. Levin, T. Mitamura, O. Rambow, V. Prabhakaran, and W. Guo. 2009. Committed belief annotation and tagging. In Proceedings of the Third Linguistic Annotation Workshop, pages 68–73. Association for Computational Linguistics.
[2] M. N. Al-Kabi, A. H. Gigieh, I. M. Alsmadi, H. A. Wahsheh, and M. M. Haidar,2014. Opinion Mining and Analysis for Arabic Language. Int. J. Adv. Comput. Sci. Appl., 5(5).
[3] Nawaf A. Abdulla, Nizar A. Ahmed, Mohammed A. Shehab and Mahmoud Al-Ayyoub, 2013. Arabic Sentiment Analysis: Lexicon-based and Corpus-based. IEEE Conference on Applied Electrical Engineering and Computing Technologies, Jordan, pp.1-6.
[4] M. Abdul-Mageed and M. T. Diab. Subjectivity and sentiment annotation of modern standard arabic newswire. In Proceedings of the 5th Linguistic Annotation Workshop, LAW V ’11, pages 110–118, 2011.
[5] Al-kabi MN, Abdulla NA and Al-ayyoub M. An analytical study of arabic sentiments: maktoob case study. In: 8th international conference for internet technology and secured transactions, IEEE, London, UK, pp. 89-94, 2013.
[6] Farra N, Challita E, Abou-assi R and Hajj H. Sentence-level and document-level sentiment mining for arabic texts. In: International conference on data mining workshops, IEEE, pp. 1114-1119, 2010.
[7] Korayem M, Crandall D and Abdul-mageed M. Subjectivity and sentiment analysis of arabic: a survey. In Advanced Machine Learning Technologies and Applications, 128-139, 2012.
[8] Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb. Survey on Arabic Sentiment Analysis in Twitter. International Science Index, 9 (1), pp. 364-368, 2015.
[9] Mubarak, H., & Darwish, K. (2014). Using twitter to collect a multidialectal corpus of arabic. ANLP 2014, 1.
[10] Diab, M., Habash, N., Rambow, O., Altantawy, M., & Benajiba, Y. (2010). Colaba: Arabic dialect annotation and processing. In Lrec workshop on semitic language processing (pp. 66–74).
[11] Habash, N., & Rambow, O. (2006). Magead: a morphological analyzer and generator for the arabic dialects. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics.
[12] Buckwalter, T. (2004). Buckwalter arabic morphological analyzer version 2.0. ldc catalog number ldc2004l02 (Tech. Rep.). ISBN 1- 58563-3-0.
[13] Faisal Al-Shargi, Owen Rambow. DIWAN: A Dialectal Word Annotation Tool for Arabic. Proceedings of the Second Workshop on Arabic Natural Language Processing, pages 49–58, Beijing, China, July 26-31, 2015. c 2014 Association for Computational Linguistics.
[14] Eshrag Refaee and Verena Rieser. 2014. An Arabic twitter corpus for subjectivity and sentiment analysis. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 14), Reykjavik, Iceland, may. European Language Resources Association (ELRA).
[15] A. Abbasi, H. Chen, and A. Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst., 26:1–34.
[16] M. Abdul-Mageed and M. Diab. AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), 2012.
[17] Mahmoud Nabil, Mohamed Aly, Amir F. Atiya. ASTD: Arabic Sentiment Tweets Dataset. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2515–2519, Lisbon, Portugal, 17-21 September 2015. c 2015 Association for Computational Linguistics.
[18] A. Balahur and R. Steinberger. 2009. Rethinking Sentiment Analysis in the News: from Theory to Practice and back. Proceeding of WOMSA.
[19] Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.
[20] Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, 22(2):249–254.
[21] F. Palmer. 1986. Mood and Modality. 1986. Cambridge: Cambridge University Press.
[22] L. Polanyi and A. Zaenen. 2006. Contextual valence shifters. Computing attitude and affect in text: Theory and applications, pages 1–10.