Estimating Word Translation Probabilities for Thai – English Machine Translation using EM Algorithm
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32794
Estimating Word Translation Probabilities for Thai – English Machine Translation using EM Algorithm

Authors: Chutchada Nusai, Yoshimi Suzuki, Haruaki Yamazaki

Abstract:

Selecting the word translation from a set of target language words, one that conveys the correct sense of source word and makes more fluent target language output, is one of core problems in machine translation. In this paper we compare the 3 methods of estimating word translation probabilities for selecting the translation word in Thai – English Machine Translation. The 3 methods are (1) Method based on frequency of word translation, (2) Method based on collocation of word translation, and (3) Method based on Expectation Maximization (EM) algorithm. For evaluation we used Thai – English parallel sentences generated by NECTEC. The method based on EM algorithm is the best method in comparison to the other methods and gives the satisfying results.

Keywords: Machine translation, EM algorithm.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1075810

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1626

References:


[1] N. Ide and and J. Veronis, "Introduction to special issue on word sense disambiguation," The stat of the art. Computational Linguistics, 1998, 24(1):1-40.
[2] NECTEC: National Electronics and Computer Technology Center, Thailand, http:// www.nectec.or.th.
[3] D. Crystala, Dictionary of Linguistics and Phonetics, Blackwell, Oxford, UK, 1996.
[4] R. Wardhaugh, Introduction to Linguistics, McGraw-Hill Book Company. a. The study of language, Language in communication, 1972.
[5] D.Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," in Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995.
[6] I. Dagan and Itai, A., "Word sense disambiguation using a second language monolingual corpus," Computational Linguistics, 20(4):563- 596, 1994.
[7] T. M. Miangah and A. D. Khalafi, "Statistical analysis of target language corpus for word sense disambiguation in a machine translation system," presented at the 9th EAMT European association for Machine translation, 2004.
[8] G. McLachlan and T. Krishnan, The EM algorithm and extensions. Wiley series in probability and statistics, John Wiley & Sons. , 1997.
[9] A. Mario, Lecture Notes on the EM algorithm, 2004.
[10] J. Cathcart and R. Dale, "Producing a Cross-Language Dictionary using Statistical Machine," in Australasian Natural Language Processing Workshop, Macquarie University, Sydney, Australia, 2001.
[11] W. Wang and K. Knight, "Binarizing Syntax Trees to Improve Syntax- Based Machine Translation Accuracy," in Proc. EMNLP-CoNLL, pp. 746-754, Prague, 2007.
[12] C. Yunbo and L. Hang , "Base Noun Phrase Translation Using Web Data and the EM Algorithm," in Proc. of COLING-2002, pp.127-133, 2002.
[13] A. Kaban, Introduction to Bayesian Learning, School of Computer Science University of Birmingham, 2004.
[14] Reuter News Corpus, available : http://trec.nist.gov/data/reuters/reuters.html
[15] Thai Concordance corpus, Department of Linguistics, Chulalongkorn University, available: http://www.arts.chula.ac.th/~ling/ThaiConc.
[16] Thai-English Lexitron dictionary, available: http://lexitron.nectec.or.th