N-Grams: A Tool for Repairing Word Order Errors in Ill-formed Texts
Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Konstantinos Mamouras
Abstract:
This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.
Keywords: Permutations filtering, Statistical language model N-grams, Word order errors, TOEFL
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1077086
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1667References:
[1] J. A., Hawkins, A Performance Theory of Order and Constituency. Cambridge, Cambridge University Press, 1994.
[2] D., Schneider, K.F., McCoy, Recognizing syntactic errors in the writing of second language learners, Proceedings of the 17th international conference on Computational linguistics, 1198-1204, 1998.
[3] E.S., Atwell, How to detect grammatical errors in a text without parsing it. In Proceedings of the 3rd EACL, 38-45, 1987.
[4] A., Golding, A Bayesian hybrid for context-sensitive spelling correction. Proceedings of the 3rd Workshop on Very Large Corpora, 39--53. 1995
[5] M.,Chodorow, C., Leacock. An unsupervised method for detecting grammatical errors. In Proceedings of NAACL-00, 140-147. 2000.
[6] T. Heift, Designed Intelligence: A Language Teacher Model, Unpublished Ph.D. Dissertation, Simon Fraser University,1998
[7] T. Heift, Intelligent Language Tutoring Systems for Grammar Practice. Zeitschrift f├╝r Interkulturellen Fremdsprachenunterricht (Online), 6 (2), 15 pp. 2001
[8] J., Bigert, O., Knutsson. Robust error detection: A hybrid approach combining unsupervised error detection and linguistic knowledge. In Proceedings of Robust Methods in Analysis of Natural language Data, (ROMAND 2002), 10-19, 2002.
[9] J., Sjöbergh, Chunking: an unsupervised method to find errors in text, Proceedings of the 15th Nordic Conference of Computational Linguistics, NODALIDA 2005, 2005
[10] S.J., Young,. Large Vocabulary Continuous Speech Recognition, IEEE Signal Processing Magazine 13, (5), 45-57, 1996.
[11] I.J., Good, The population frequencies of species and the estimation of population parameters. Biometrika, 40(3 and 4):237-264, 1953.
[12] S.M., Katz, Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400-401, 1987.
[13] C. M., Feyton, Teaching ESL/EFL with the internet. Merill Prentice- Hall, 2002.
[14] K.S., Folse, Intermediate TOEFL Test Practices (rev. ed.). Ann Arbor, MI: The University of Michigan Press, 1997.
[15] J. C., Park, M., Palmer, and G., Washburn, An English grammar checker as a writing aid for students of English as a second language, In Proceedings of Conference on Applied Natural Language Process, New Brunswick, NJ,1997.
[16] R., Murphy, Order of several describing words together (adjectives), English Grammar in Use Cambridge University Press, Cambridge, Unit 95, 1990.
[17] J.,Eastwood, Order of place, time and frequency words (never, often), Oxford Practice Grammar Oxford University Press, Oxford. Unit 89, 1997.
[18] E., Izumi, K., Uchimoto, T., Saiga, T., Supnithi, H. Isahara, Automatic error detection in the Japanese learners English spoken data. In Companion Volume to the Proceedings of ACL -03, 145-148, 2003