Words Reordering based on Statistical Language Model

Theologos Athanaselis; Stelios Bakamidis; Ioannis Dologlou

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33132

Words Reordering based on Statistical Language Model

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou

Abstract:

There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The comparative advantage of this method is that works with a large set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns.

Keywords: Permutations filtering, Statistical languagemodel N-grams, Word order errors

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1056066

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1594

References:

[1] E.S., Atwell, How to detect grammatical errors in a text without parsing it. In Proceedings of the 3rd EACL, 38-45, 1987.
[2] A., Golding, A Bayesian hybrid for context-sensitive spelling correction. Proceedings of the 3rd Workshop on Very Large Corpora, 39--53. 1995
[3] M.,Chodorow, C., Leacock. An unsupervised method for detecting grammatical errors. In Proceedings of NAACL-00, 140-147. 2000.
[4] T. Heift, Designed Intelligence: A Language Teacher Model, Unpublished Ph.D. Dissertation, Simon Fraser University,1998
[5] T. Heift, Intelligent Language Tutoring Systems for Grammar Practice. Zeitschrift f├╝r Interkulturellen Fremdsprachenunterricht (Online), 6 (2), 15 pp. 2001
[6] J., Bigert, O., Knutsson. Robust error detection: A hybrid approach combining unsupervised error detection and linguistic knowledge. In Proceedings of Robust Methods in Analysis of Natural language Data, (ROMAND 2002), 10-19, 2002.
[7] J., Sjöbergh, Chunking: an unsupervised method to find errors in text, Proceedings of the 15th Nordic Conference of Computational Linguistics, NODALIDA 2005, 2005
[8] S.J., Young,. Large Vocabulary Continuous Speech Recognition, IEEE Signal Processing Magazine 13, (5), 45-57, 1996.
[9] I.J., Good, The population frequencies of species and the estimation of population parameters. Biometrika, 40(3 and 4):237-264, 1953.
[10] S.M., Katz, Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400-401, 1987
[11] K.S., Folse, Intermediate TOEFL Test Practices (rev. ed.). Ann Arbor, MI: The University of Michigan Press, 1997.
[12] C. M., Feyton, Teaching ESL/EFL with the internet. Merill Prentice- Hall, 2002.
[13] J. A., Hawkins, A Performance Theory of Order and Constituency. Cambridge, Cambridge University Press, 1994.