Highlighting Document's Structure

Sylvie Ratté; Wilfried Njomgue; Pierre-André Ménard

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33104

Highlighting Document's Structure

Authors: Sylvie Ratté, Wilfried Njomgue, Pierre-André Ménard

Abstract:

In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).

Keywords: Information retrieval, document structures, symbolic grammars.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1084163

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1227

References:

[1] P. Lyman, and H. R. Varian, "How Much Information", 2003, Retrieved from http://www.sims.berkeley.edu/how-much-info-2003 on August 30th, 2007.
[2] F. Role, and G. Rousse, "Construction incrémentale d-une ontologie par analyse du texte et de la structure des documents", in Document numérique, Lavoisier, 2006, Vol. 9, No 1, p. 77-91.
[3] T. Schlieder, and H. Meuss, "Querying and ranking XML documents", Special Topic Issue of the Journal of the American Society of Information Science on XML and Information retrieval, 2002.
[4] Y. Prie, "Sur la piste de l-indexation conceptuelle de documents. Une approche par l-annotation", in Document Numérique, numéro spécial "L'indexation", Lavoisier, December 2000, Vol. 4, No 162, pp. 11-35.
[5] H. Zargayouna, "Indexation sémantique de documents XML", 2005, Ph.D. Thesis, Université Paris-Sud, France.
[6] D. Kerkouba, "Une méthode d-indexation automatique des documents fondée sur l-exploitation de leurs propriétés structurelles. Application ├á un corpus technique", 1984, Ph.D. Thesis, Grenoble, France
[7] X. Tannier, "Recherche d-information dans les documents XML" in rapport de recherche 2006-400-007, Centre Génie Industriel et Informatique (G2I) de l-Ecole Nationale Supérieure des Mines de Saint- Etienne, France, 2006.
[8] W. Njomgue, "Le système MAID : Multi-Approches pour l-Indexation des Documents au sein de l-Intranet de Suez-Environnement", Ph.D. Thesis, 2005, Université de Technologie de Compiègne, France.
[9] S. Aït-Moktar, V. Lux, and E. Banik, "Linguistic Parsing of Lists in Structured Documents" in Proceedings of the 2003 EACL Workshop on Language technology and the Semantic Web (3rd Workshop on NLP and XML, NLPXML-2003), Budapest, Hungary.
[10] L. Gagnon-Arguin, and H. Vien, "Typologie des documents des organisations - De la création ├á la conservation", Collection gestion de lÔÇÿinformation, Presse de l-Université du Québec, 2005.
[11] R. Abascal, M. Beigbeder, A. Benel, S. Calabrotto, B. Chabbat, P-A. Champin, N. Chatti, D. Jouve, Y. Prie, B. Rumple, and E. Thivant "Modéliser la structuration multiple des documents" in Rapport d-activités 2002-2003 des recherches collectives sur la « multistructuralité » des documents, Institut des Sciences du Document Numérique (ISDN), France, 30 September 2003.
[12] S. Douglas, M. Hurst, and D. Quinn, "Using Natural Language processing for Identifying and Interpreting Tables in Plain Text" in Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, 1995, pages 535-546, Las Vegas, NV, USA