Edit Distance Algorithm to Increase Storage Efficiency of Javanese Corpora
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32804
Edit Distance Algorithm to Increase Storage Efficiency of Javanese Corpora

Authors: Aji P. Wibawa, Andrew Nafalski, Neil Murray, Wayan F. Mahmudy

Abstract:

Since the one-to-one word translator does not have the facility to translate pragmatic aspects of Javanese, the parallel text alignment model described uses a phrase pair combination. The algorithm aligns the parallel text automatically from the beginning to the end of each sentence. Even though the results of the phrase pair combination outperform the previous algorithm, it is still inefficient. Recording all possible combinations consume more space in the database and time consuming. The original algorithm is modified by applying the edit distance coefficient to improve the data-storage efficiency. As a result, the data-storage consumption is 90% reduced as well as its learning period (42s).

Keywords: edit distance coefficient, Javanese, parallel text alignment, phrase pair combination

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332892

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1672

References:


[1] A. P. Wibawa and A. Nafalski, "Intelligent tutoring system: a proposed approach to Javanese language learning in Indonesia," World Institute for Engineering and Technology Education vol. 8, pp. 216-220, 2010.
[2] N. Murray, "Pragmatics, awareness raising and the cooperative principle.," E:T Journal, pp. 1-9, 2009.
[3] J. Zhao, et al., "Two-phase base noun phrase alignment in Chinese- English parallel corpora," in Natural Language Processing and Knowledge Engineering, Wuhan, 2005, pp. 360-365.B. Smith, "An approach to graphs of linear forms (Unpublished work style)," unpublished.
[4] L. Ahrenberg, et al., "A simple hybrid aligner for generating lexical corespondences in parallel text," in 36 th Annual Meetingof the Association for Computational Linguistics Montreal, Quebec, Canada., 1998, pp. 29-35.
[5] R. Terashima, et al., "Learning method for extraction of partial correspondence from parallel corpus," in International Conference on Asian Language Processing, Singapore, 2009, pp. 293-298.
[6] S. Poedjosoedarmo, "Javanese Speech Levels," Indonesia, pp. 54-81, 1968.
[7] P. Purwadi, et al., Javanese language structure. Yogyakarta: Media Abadi, 2005.
[8] A. B. Setiyanto, Parama Satra: Javanese Language. Yogyakarta: Panji Pustaka, 2010.
[9] Sukarno, "The Reflection of the Javanese Cultural Concepts in the Politeness of Javanese," k@ta, vol. 12, pp. 59-71, 2010.
[10] S. Wibawa, "Efforts to maintain and develop Javanese language politeness," in International Seminar of Javanese Language, Paramaribo,Suriname, 2005, pp. 1-10.