Distributional Semantics Approach to Thai Word Sense Disambiguation
Authors: Sunee Pongpinigpinyo, Wanchai Rivepiboon
Abstract:
Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.
Keywords: Distributional semantics, Latent Semantic Indexing, natural language processing, Polysemous words, unsupervisedlearning, Word Sense Disambiguation.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1079186
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1813References:
[1] E. Agirre and G. Rigau, "A proposal for word sense disambiguation using conceptual distance", In Proc. the International Conference Recent Advances in Natural Language Processing, Tzigov Chark, Bulgaria, 1995.
[2] M. W. Berry, S. T. Dumais and G. W. O-Brien, "Using Linear Algebra for Intelligent Information Retrieval", SIAM: Review, vol.37 no. 4, 1995, pp. 573-595.
[3] M. W. Berry, "Large Scale Singular Value Computations", International J. Supercomputer Applications, vol.6, pp. 13-49, 1992.
[4] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, "Indexing by Latent Semantic Analysis", J. the American Society for Information Science, vol. 41, 1990, pp. 391-407.
[5] R. O. Duda,. P. E. Hart and D. G. Stork, Pattern Classification, 2nd ed., Wiley, 2000.
[6] S. T. Dumais, "Latent Semantic Indexing (LSI) and TREC-2", In Proc. 2nd Text Retrieval Conf. (TREC-2), March, 1994, pp. 105-115.
[7] P. W. Foltz, "Latent Semantic Analysis for text-based research", Behavior Research Methods, Instruments and Computers, vol. 28 no. 2, 1996, pp. 197-202.
[8] I. T. Jolliffe, Principal Component Analysis, Springer Verlag, 1986.
[9] W. Kanokrattanukul, "Word Sense Disambiguation in Thai Using Decision List Collocation", Master Thesis, Dept. Linguistics, Chulalongkorn Univ., 2001.
[10] T. K. Landauer and S. T. Dumais, "A Solution to Plato-s Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction, and Representation of Knowledge", Psychological Review, vol. 104, no. 2, 1997, pp. 211-240.
[11] C. Leacock, M. Chodorow and G. A. Miller, "Using Corpus Statistics and WordNet Relations for Sense Identification", Computational Linguistics, vol. 24, no. 1, 1998, pp. 147-165.
[12] G. A. Miller, M. Chodorow, S. Landes, C. Leacock and R. G. Thomas, "Using a semantic concordance for sense identification", In Proc. the ARPA Human Language Technology Workshop, 1994.
[13] H. T. Ng and H. B. Lee, "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Examplar-Based Approach", In Proc. 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 1996.
[14] T. Pedersen and R. Bruce, "Distinguishing word senses in untagged text", In Proc. 2nd Conf. Empirical Methods in Natural Language Processing, 1997, pp. 197-207.
[15] J. I. Saeed, Semantics, The United Kingdom, Blackweel Publishers, 1997.
[16] H. Schutze, "Dimensions of Meaning", In Proc. Supercomputing, 1992, pp. 787-796.
[17] G. Strang, Algebra and its applications, 2nd ed., Academic Press, 1980.
[18] "Smart Word Analysis for Thai", 2002, National Electronics and Computer Technology Center (NECTEC), Information Research and Development Division.
[Online] Available: http:// www.links.nectec.or.th/.
[19] D. Yarowsky, "Unsupervised Word Sense Disambiguation Rivaling Supervised Methods", In Proc. 33rd Annual Meeting of the Association of Computational Linguistics, Cambridge, Massachusetts, 1995.
[20] U. Zernik, "Train1 vs. Train2: Tagging Word Sense in Corpus. Lexical Acquisition: Exploiting on-line Resources to Build a Lexicon", In Proc. Recherche d'Informations Assistée par Ordinateur, 1991, pp. 91-112.