Arabic Word Semantic Similarity
Authors: Faaza A, Almarsoomi, James D, O'Shea, Zuhair A, Bandar, Keeley A, Crockett
Abstract:
This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.
Keywords: Arabic categories, benchmark dataset, semantic similarity, word pair, stimulus Arabic words
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080052
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3111References:
[1] S. Ravi, and M. Rada, "Unsupervised graph-based word sense disambiguation using measures of word semantic similarity," In Proceedings of ICSC, 2007.
[2] A. Hliaoutakis, G. Varelas, E. Voutsakis, E. G. M. Petrakis, and E. E. Milios, "Information retrieval by semantic similarity," International Journal on Semantic Web and Information Systems, vol. 2, no. 3, pp. 55- 73, 2006.
[3] J. Davies, U. Krohn, and R. Weeks, "QuizRDF: Search technology for the semantic web," WWW2002 workshop on RDF and Semantic Web Applications, 11th International WWW Conference WWW2002, Hawaii, USA, 2002.
[4] Y. Aytar, M. Shah, and L. Jiebo, "Utilizing semantic word similarity measures for video retrieval," IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR08), pp. 1-8, Jun. 2008.
[5] F. M. Couto, M. J. Silva, and P. M. Coutinho, "Measuring semantic similarity between Gene ontology terms," Data & Knowledge Engineering, vol. 61, no. 1, pp. 137-152, 2007.
[6] H. Chukfong, A. Masrah, M. Azmi, A. Rabiah and C. Shyamala, "Word sense disambiguation based sentence similarity," Coling 2010: Poster Volume, pp. 418-426, Beijing, Aug. 2010.
[7] E.K. Park, D.Y. Ra, and M.G. Jang, "Techniques for improving web retrieval effectiveness," Information Processing and Management, vol. 41, no. 5, pp. 1207-1223, 2005.
[8] J. Atkinson-Abutridy, C. Mellish, and S. Aitken, "Combining information extraction with genetic algorithms for text mining," IEEE Intelligent Systems, vol. 19, no. 3, 2004.
[9] K. O-Shea, Z. Bandar, and K. Crockett, "A Conversational agent framework using semantic analysis," International Journal of Intelligent Computing Research (IJICR), vol. 1, no. 1, Mar. 2010.
[10] V. S. Zuber, and B. Faltings, "OSS: A semantic similarity function based on hierarchical ontologies," In Proceedings of IJCAI, pp. 551-556, 2007.
[11] P. Resnik, "Information content to evaluate semantic similarity in a taxonomy," In Proceedings of IJCAI, pp. 448-453, 1995.
[12] M. Diab, M. Alkhalifa, S. ElKateb, C. Fellbaum, A. Mansouri, and M. Palmer, "Semeval-2007 task 18: Arabic semantic labelling," In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, 2007.
[13] M. Hijjawi, ArabChat : an Arabic Conversational Agent. PhD. Thesis, Department of Computing and Mathematics, Faculty of Science and Engineering, Manchester Metropolitan University, UK, 2011.
[14] A. Farghaly, K. Shaalan, "Arabic natural language processing: challenges and solutions," ACMTransactions on Asian Language Information Processing, vol. 8, no. 4, Article 14, 2009.
[15] N. Y. Habash, Introduction to Arabic Natural Language Processing. Graeme Hirst 2010. Morgan &Claypool, 2010, PP 11-12 & 39-41.
[16] M. Jarmasz, and S. Szpakowicz, "Roget-s Thesaurus and semantic similarity," In proceedings of the international conference on Recent Advances in Natural Language processing, Borovetz, Bulgaria, pp. 212- 219, 2003.
[17] R. Rada, H. Mili, M. Bicknell, and E. Blettner, "Development and application of a metric on semantic nets," IEEE Trans. on Systems, Man, and Cybernetics, vol. 19, pp 17-30, 1989.
[18] D. Lin, "An Information-theoretic definition of similarity," In Proceedings of Conference on Machine Learning, pp. 296-304, 1998.
[19] Y. Li, Z. Bandar, and D. McLean, "An approach for measuring semantic similarity between words using multiple information sources," IEEE Trans. on Knowledge and Data Engineering, vol. 15, no. 4, pp. 871-882, 2003.
[20] T. Pedersen, V. S. Pakhomov, S. Patwardhan, and C.G. Chute, "Measures of semantic similarity and relatedness in the Biomedical Domain," Journal of Biomedical Informatics, vol. 40, PP. 288-299, 2007.
[21] G. Pirro, "Semantic similarity metric combining features and intrinsic information content," Data & Knowledge Engineering, vol. 68. pp. 1289-1308, 2009.
[22] H. Rubenstein, and J. Goodenough, "Contextual correlates of synonymy," Communications of the ACM, Vol. 8, pp.627-633, 1965.
[23] G.A. Miller, and W.G. Charles, "Contextual correlates of semantic similarity," Language and Cognitive Processes, vol. 6, pp.1-28, 1991.
[24] J.D. O-Shea, Z. Bandar, K. Crockett, and D. McLean, "Benchmarking Short Text Semantic Similarity," Int. J. Intelligent Information and Database Systems, vol. 4, no. 2, pp. 103-120, 2010.
[25] W.F. Battig, and W.E. Montague, "Category norms for verbal items in 56 categories: A replication and extension of the Connecticut category norms," Journal of Experimental Psychology Monographs, vol. 80, PP. 1-46, 1969.
[26] J.P. Van Overschelde, K.A. Rawson, and J. Dunlosky (2004), "Category norms: An updated and expanded version of the Battig and Montague (1969) norms," Journal of Memory and Language, vol. 50, pp. 289-335, 2004.
[27] B. Munir, AL-MAWRID: A Modern English-Arabic Dictionary. Dar ELILMILMALAYIN, Beirut, Lebanon. Edition 11, 1977. www.malayin.com.
[28] J. Sinclair, Collins Cobuild English Dictionary for Advanced Learners, 3rd edn. Harper Collins, New York, 2001.
[29] S. Elkateb, W. Black, H. Rodriguez, M. Alkhalifa, P. Vossen, A. Pease, and C. Fellbaum, "Building a WordNet for Arabic," In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa, Italy, 2006.