Determining the Gender of Korean Names for Pronoun Generation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32799
Determining the Gender of Korean Names for Pronoun Generation

Authors: Seong-Bae Park, Hee-Geun Yoon

Abstract:

It is an important task in Korean-English machine translation to classify the gender of names correctly. When a sentence is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus, in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method yields the improvement in accuracy over the simple looking-up of the collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed method is more plausible for the gender classification of the Korean names.

Keywords: machine translation, natural language processing, gender of proper nouns, statistical method

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1061842

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2310

References:


[1] E.-S. Chung, Y.-G. Hwang, and M.-G. Jang, "Korean Named Entity Recognition Using HMM and Co-Training Model," In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages, pp. 161-167, 2003.
[2] C. Drummond and R. Holte, "C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling," In Proceedings of Workshop on Learning from Imabalanced Datasets II, ICML, 2003.
[3] N.-R. Han, Korean Zero Pronouns: Analysis and Resolution, Ph.D Thesis, University of Pennsylvania, 2006.
[4] S. Katz, "Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 35, No. 3, pp. 400-401, 1987.
[5] K.-N. Kim, Y.-H. Yoon, H.-S. Kim, and J.-Y. Seo, "Named Entity Recognition Using Acyclic Weighted Digraphs: A Semi-Supervised Statistical Method," Lecture Notes in Computer Science, Vol. 4426, pp. 571-578, 2007.
[6] Y.-T. Kim, Introduction to Natural Language Processing, 2nd Edition, Saeng-Neung Publisher, 2001. (In Korean)
[7] B.-K. Kwak and J.-W. Cha, "Named Entity Tagging for Korean Using DL-CoTrain Algorithm," Lecture Notes in Computer Science, Vol. 3689, pp. 589-594, 2005.
[8] C.-K. Lee, Y.-G. Hwang, H.-J. Oh, S.-J. Lim, J. Heo, C.-H. Lee, H.-J. Kim, J.-H. Wang, and M.-G. Jang, "Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering," Lecture Notes in Computer Science, Vol. 4182, pp. 581-587, 2006.
[9] S.-H. Lee, D. Byron, and S.-B. Jang, "Why Is Zero Marking Important in Korean?" In Proceedings of the 2nd International Conference on Natural Language Processing, pp. 588-599, 2005.
[10] J.-E. Roh and J.-H. Lee, "Generation of Zero Pronouns Based on the Centering Theory and Pairwise Salience of Entities," IEICE Transactions on Information and Systems, Vol. E880D(2), pp. 837-846, 2006.
[11] C.-N. Seon, Y-.J. Ko, J. Kim, and J.-Y. Seo, "Named Entity Recognition Using Machine Learning Methods and Pattern-Recognition Rules," In Proceedings of the 6th Natural Language Processing Pacific Rim Symposium, 2001.
[12] S. Zhao and H. Ng, "Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach," In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 541-550, 2007.
[13] G. Zhou and J. Su, "Named Entity Recognition Using an HMM-Based Chunk Tagger," In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 473-480, 2002.