Parallel Text Processing: Alignment of Indonesian to Javanese Language
Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy
Abstract:
Parallel text alignment is proposed as a way of aligning bahasa Indonesia to words in Javanese. Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).
Keywords: Parallel text alignment, phrase pair combination, edit distance coefficient, Javanese-Indonesian language.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1335958
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2485References:
[1] W. Wedhawati, et al., Latest structure of Javanese language Yogyakarta: Kanisius, 2006.
[2] D. E. Subroto, et al., "Endangered krama and krama Inggil varieties of the Javanese language," Linguistik Indonesia, vol. 26, pp. 89-96, 2008.
[3] A. P. Wibawa and A. Nafalski, "Intelligent tutoring system: a proposed approach to Javanese language learning in Indonesia," World Institute for Engineering and Technology Education vol. 8, pp. 216-220, 2010.
[4] N. Murray, "Pragmatics, awareness raising and the cooperative principle.," E:T Journal, pp. 1-9, 2009.
[5] J. Zhao, et al., "Two-phase base noun phrase alignment in Chinese-English parallel corpora," in Natural Language Processing and Knowledge Engineering, Wuhan, 2005, pp. 360-365.
[6] L. Ahrenberg, et al., "A simple hybrid aligner for generating lexical corespondences in parallel text," in 36 th Annual Meetingof the Association for Computational Linguistics Montreal, Quebec, Canada., 1998, pp. 29-35.
[7] R. Terashima, et al., "Learning method for extraction of partial correspondence from parallel corpus," in International Conference on Asian Language Processing, Singapore, 2009, pp. 293-298.
[8] S. Poedjosoedarmo, "Javanese Speech Levels," Indonesia, pp. 54-81, 1968.
[9] S. Poedjosoedarmo, "Wordlist of Javanese Non-Ngoko Vocabularies," Indonesia, vol. 7, pp. 165-190, 1969.
[10] P. Purwadi, et al., Javanese language structure. Yogyakarta: Media Abadi, 2005.
[11] A. B. Setiyanto, Parama Satra: Javanese Language. Yogyakarta: Panji Pustaka, 2010.
[12] Sukarno, "The Reflection of the Javanese Cultural Concepts in the Politeness of Javanese," k@ta, vol. 12, pp. 59-71, 2010.
[13] A. P. Wibawa, et al., "Edit Distance Algorithm To Increase Storage Efficiency Of Javanese Corpora," in International Conference on Computer, Electrical, and Systems Sciences, and Engineering (ICCESSE), Singapore, 2012, p. 5.
[14] D. Sotiropoulus, et al., "Application of a word alignment algorithm to bilingual Greek-Latin documents," in International Conference on Applied Computer Science, Venice, Italy, 2007, pp. 238-241.
[15] S. Wibawa, "Efforts to maintain and develop Javanese language politeness," in International Seminar of Javanese Language, Paramaribo,Suriname, 2005, pp. 1-10.