Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30831
Knowledge Required for Avoiding Lexical Errors at Machine Translation

Authors: Yukiko Sasaki Alam


This research aims at finding out the causes that led to wrong lexical selections in machine translation (MT) rather than categorizing lexical errors, which has been a main practice in error analysis. By manually examining and analyzing lexical errors outputted by a MT system, it suggests what knowledge would help the system reduce lexical errors.

Keywords: Machine Translation, Error Analysis, causes of errors, outputs evaluation

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1036


[1] Anick, P, Verhagen, M., and Pustejovsky, J. 2014. Identification of Technology Terms in Patents. LREC 2014. 2008-2014.
[2] Baldwin, T., Bannard, C., Tanaka, T. and Widdows, D. 2003. An Empirical Model of Multiword Expression Decomposaibility. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions Analysis, Acquisition and Treatment. 89-96.
[3] Church, K. 2013. How Many Multiword Expressions Do People Know? ACM Transactions on Speech and Language Processing. 10(2), Article 4: 1-13.
[4] Elliot, D, Hartley, A., and Atwell, E. 2004. A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation. AMTA 2004. Pages 64-73.
[5] Farrús, M., Costa-jussa, M., Marino, J., and Jose Fonollosa, J. 2010. Linguistic-based Evaluation Criteria to Identify Statistical Machine Translation Errors. EAMT 2010. Pages 167-173.
[6] Farrús, M., Costa-jussa, M., Marino, J., Posh, M., Hernandez, A., Henriquez, C., Jose A., and Fonollosa, J. 2011. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan-Spanish language pair. Language Resources and Evaluation (Springer). Vol. 45 Issue 2. 181-208.
[7] Flanagan, M. 1994. Error classification for MT evaluation. AMTA 1994. 65-72.
[8] Hunston, S. and Francis, G. 2000. Pattern Grammar A corpus-driven approach to the lexical grammar of English. Benjamins Publishing Co.
[9] Hurskainen, A. 2008 Multiword Expressions and Machine Translation. Technical Reports in Language Technology Report No 1, 2008.
[10] Kim, S. and Baldwin, T. 2013. Word Sense and Semantic Relations in Noun Compounds. ACM Transactions on Speech and Language Processing. 10(3), Article 9: 1-17.
[11] Kordoni, V. and Simova, I. 2014. Multiword Expressions in Machine Translation. LREC 2014. 1208-1211.
[12] Lau, J., Baldwin, T., and Hewman, D. 2013. On Collocations and Topic Models. ACM Transactions on Speech and Language Processing. 10(3), Article 10: 1-14.
[13] Nadeau, D. and Sekine, S. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes. 30(1):3-26.
[14] Popović, M. and Burchardt, A. 2011. From Human to Automatic Error Classification for Machine Translation Output. Proceedings of the 15th Conference of the European Association for Machine Translation. 265-272,
[15] Ramisch, C., Villavicencio, A., and Kordoni, V. 2013. Introduction to the Special Issue on Multiword Expressions: From Theory to Practice and Use. ACM Transactions on Speech and Language Processing. 10(2), Article 3: 1-10.
[16] Sag, I., Baldwin, T., Bond, F., Copestake, A, and Flickinger, D. 2002. Multiword Expressions: A Pain in the Neck for NLP, In Proc. of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), pages 1–15, Mexico City, Mexico.
[17] Stymne, S. and Ahrenberg, L. 2012. On the practice of error analysis for machine translation evaluation. LREC. 1785-1790.
[18] Shutova, E., Kaplan, J., Teufel, S., and Korhonen, A. 2013. A Computational Model of Logical Metonymy. ACM Transactions on Speech and Language Processing. 10(3), Article 11:1-28.
[19] Vilar, D., Xu, J., D’Haro, L., and Ney, H. 2006. Error Analysis of Statistical Machine Translation Output. Proceedings of the LREC. 697-702.
[20] Wong W, Liu W, and Bennamoun M. 2007. Determining Termhood for Learning Domain Ontologies in a Probabilistic Framework. In 6th Australasian Conference on Data Mining (Isbn 978-1-920682-51-4).