Enhancing Retrieval Effectiveness of Malay Documents by Exploiting Implicit Semantic Relationship between Words
Authors: Mohd Pouzi Hamzah, Tengku Mohd Tengku Sembok
Abstract:
Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.
Keywords: Information Retrieval, Malay Language, Semantic Relationship, Retrieval Effectiveness, Conceptual Indexing.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062042
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1427References:
[1] Atlam, E.S., Fuketa, M., Morita, K., & Aoe, J., Documents Similarity Measurement Using Field Association Terms, Information Processing and Management Journal, 39, 2003, pp. 809-824.
[2] Baeza-Yates, R & Ribeiro-Neto, B., Modern Information Retrieval, Addison-Wesley, New York, 1999.
[3] Croft, W. B., User-specified Domain Knowledge for Document Retrieval, Proceedings Of The ACM Conference On Research And Development In Information Retrieval, 1986, pp. 201-206.
[4] Fatimah A., A Malay Language Document Retrieval System: An Experimental Approach And Analysis, Ph.D Thesis, Universiti Kebangsaan Malaysia, 1995
[5] Fagan, J. L, Experiments in Automatic Phrase Indexing for Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods, Ph.D. Thesis, Department of Computing Science, Cornell University, Ithica, New York, 1987.
[6] Lewis, D.D. and Jones, K.S., Natural Language Processing for Information Retrieval, Communication of the ACM, Vol 39 No. 1 , 1996, pp. 92-100.
[7] Sanderson, M. ,Word Sense Disambiguation and Information Retrieval, Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 1994, pp. 142-151, Springer-Verlag.
[8] Salton, G., A Blueprint For Automatic Indexing, ACM SIGIR Forum 16, 2 (Fall 1981), 1981, pp. 22-38.
[9] Salton, C.. and Lesk., M.E. Computer Evaluation Of Indexing And Text Processing, Communication of the ACM, Vol 15 No. 1 , 1986, pp. 6-36.
[10] Salton, G., Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983.
[11] Salton, G., Another Look At Automatic Text Retrieval Systems, Communications of the ACM, Vol 29 No. 7, 1986, pp. 648-656.
[12] Van Rijsbergen, C.J. Information Retrieval, 2nd edition, Butterworth.,1979.
[13] Yun, B. H., H. S. Lim and H.C. Rim, Analysis of Korean Compound Nouns using Statistical Information, Proc. of the 22nd Korea Information Science Society Spring Conference, 1994, pp 925-928.
[14] Zainab Abu Bakar, Evaluation Of Retrieval Effectiveness Of Conflation Methods On Malay Documents, Ph.D Thesis, Universiti Kebangsaan Malaysia, 1999.
[15] Zainab Abu Bakar & Nurazzah Abdul Rahman, Evaluating The Effectiveness Of Thesaurus And Stemming Methods In Retrieving Malay Translated Al-Quran Documents, Proceeding Of 6th International Conference On Asian Digital Libraries, 2003, pp. 653- 662. Springer-verlag.