Tagging by Combining Rules- Based Method and Memory-Based Learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33093
Tagging by Combining Rules- Based Method and Memory-Based Learning

Authors: Tlili-Guiassa Yamina

Abstract:

Many natural language expressions are ambiguous, and need to draw on other sources of information to be interpreted. Interpretation of the e word تعاون to be considered as a noun or a verb depends on the presence of contextual cues. To interpret words we need to be able to discriminate between different usages. This paper proposes a hybrid of based- rules and a machine learning method for tagging Arabic words. The particularity of Arabic word that may be composed of stem, plus affixes and clitics, a small number of rules dominate the performance (affixes include inflexional markers for tense, gender and number/ clitics include some prepositions, conjunctions and others). Tagging is closely related to the notion of word class used in syntax. This method is based firstly on rules (that considered the post-position, ending of a word, and patterns), and then the anomaly are corrected by adopting a memory-based learning method (MBL). The memory_based learning is an efficient method to integrate various sources of information, and handling exceptional data in natural language processing tasks. Secondly checking the exceptional cases of rules and more information is made available to the learner for treating those exceptional cases. To evaluate the proposed method a number of experiments has been run, and in order, to improve the importance of the various information in learning.

Keywords: Arabic language, Based-rules, exceptions, Memorybased learning, Tagging.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1074643

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1622

References:


[1] A.Goweder, M.Poesio, A.De Roeck, J.Reynolds,ÔÇÿIdentifying Broken Plurals in Unvowelised Arabic Text-, ACL 2001. Arabic Language Processing.
[2] A. Farghali, ÔÇÿComputer Processing of Arabic Script-based Languages: Curent State and Future Directions-, Coling 2004, Work Shop on Computational Approaches to Arabic Script-based Language, Geneva, Switzerland, August 28, 2004.
[3] A.Roberts, ÔÇÿMachine Learning in Natural Language Processing-, www.comp.Leeds.ac.uk , October 16, 2003.
[4] J. Zavrel & Walter.Daelemans, ÔÇÿRecent Advances in Memory-Based Part-of-Speech Tagging-, Induction of Linguistic Knowledge TSL 2000.
[5] M. Van Mol, ÔÇÿThe semi-automatic tagging of Arabic corpora-, The Dutch language Union, Amsterdam, Bulaaq, 2001.
[6] M. Maamouri & Ann Bies, ÔÇÿ Developing an Arabic Treebank: Method, Guidelines, Procedures, and Tools-, Coling 2004, Workshop on Computational Approaches to Arabic Script-based Language, Geneva, Switzerland, August 28, 2004.
[7] M. Diab & Kadri.Hacioglu & Daniel Jurafsky, Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks, The National Science Foundation, USA, 2004.
[8] S. Abuleil & K. Alsamara & Martha.Evens, ÔÇÿAcquisition System for Arabic Noun Morphology-, Computer and Humanities 36(2):191-221, May 2002.
[9] Saleem.Abuleil & Martha.Evens, ÔÇÿDiscovering Lexical Information by Tagging Arabic Newspaper Text-, Workshop on Semitic Language Processing. COLING-ACL-98.
[10] T. Buckwalter. 2002. Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, Catalog number LDC 2002L49 and ISBN 1-58563-257-0, http://www.ldc.upenn.edu .
[11] Seong-Bac.Park & Byoung-Tak.Zhang, ÔÇÿText Chunking by Combining Hand-Crafted Rules and Memory-Based Learning-, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, July 2003, pp 497-504.
[12] S. Khoja, R.Garside, G.Knowles, ÔÇÿA tagset for the morph syntactic tagging of Arabic-, http://www.comp.lancs.au.uk/computing/users/khoja/cl2001.pdf .
[13] T. Buckwalter, ÔÇÿIssues in Arabic Orthography and Morphology Analysis-, Coling 2004, Workshop on Computational Approaches to Arabic Script-based Language, Geneva, Switzerland, August 28, 2004.
[14] Valli.André & Jean.Veronis, ÔÇÿEtiquetage grammatical des corpus de parole : problèmes et perspectives-, http://www.up.univ-mrs.fr/~veronis/pdf/1999rfla.pdf
[15] W. Daelemans & Antal van den.Bosch & Jakub.Zavrel & Jorn.Veenstra & Sabine.Buchholz & Bertjan.Busser , ÔÇÿRapid Development of NLP Modules with Memory-based Learnig-, Proceeding of ELSNET in Wonderland, March 1998, pp105-113.
[16] W. Daelemans & Jakub.Zavrel, ÔÇÿPart-of-Speech Tagging of Dutch with MBT- Informatiewetenschap 1996, pp 33-40, The Netherlands.TU Delft.
[17] Young-suk.Lee, Kishore Papineni, Salim.Roukos, ÔÇÿLangage Model Based Arabic Word Segmentation-, www.acl.ldc.upenn.edu .