Semantic Indexing Approach of a Corpora Based On Ontology

Mohammed Erritali

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Semantic Indexing Approach of a Corpora Based On Ontology

Authors: Mohammed Erritali

Abstract:

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. This paper presents a new semantic indexing approach of a documentary corpus. The indexing process starts first by a term weighting phase to determine the importance of these terms in the documents. Then the use of a thesaurus like Wordnet allows moving to the conceptual level. Each candidate concept is evaluated by determining its level of representation of the document, that is to say, the importance of the concept in relation to other concepts of the document. Finally, the semantic index is constructed by attaching to each concept of the ontology, the documents of the corpus in which these concepts are found.

Keywords: Semantic, indexing, corpora, WordNet, ontology.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1106227

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1373

References:

[1] Ricardo B Y., Berthier R N. Modern information retrieval, ACM (Association for Computing Machinery).
[2] Baziz, M. (2005). Indexation conceptuelle guidée par ontologie pour la recherche d'information (Doctoral dissertation, Toulouse 3).
[3] Mooers, C. N. (1948). Application of random codes to the gathering of statistical information (Doctoral dissertation, Massachusetts Institute of Technology).
[4] KARBASI, S. Pondération des termes en Recherche d’Information (Doctoral dissertation, Toulouse 3).
[5] Harrathi, F. (2009). Extraction de concepts et de relations entre concepts à partir des documents multilingues: approche statistique et ontologique.
[6] Salton, G. (1969). A comparison between manual and automatic indexing methods. American Documentation, 20(1), 61-71.
[7] Mallak, I. (2011). De nouveaux facteurs pour l'exploitation de la sémantique d'un texte en Recherche d'Information (Doctoral dissertation, Université Paul Sabatier-Toulouse III).
[8] Aouicha, M. B. (2009). Une approche algébrique pour la recherche d'information structurée (Doctoral dissertation).
[9] Barry, C. L. (1994). User-defined relevance criteria: an exploratory study.JASIS, 45(3), 149-159.
[10] Boubekeur-Amirouche, F. (2008). Contribution à la définition de modèles de recherche d'information flexibles basés sur les CP-Nets (Doctoral dissertation, Université de Toulouse, Université Toulouse IIIPaul Sabatier).
[11] Roussey, C. (2001). Une méthode d’indexation sémantique adaptée aux corpus multilingues. Institut National des Sciences Appliquées de Lyon Lyon, Ecole Doctorale Informatique et Information pour la Société.
[12] Azzoug, W. (2014). Contribution à la définition d’une approche d’indexation sémantique de documents textuels.
[13] Porter, M. F. (1980). An algorithm for suffix stripping. Program: electronic library and information systems, 14(3), 130-137.
[14] Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1995, November). New retrieval approaches using SMART: TREC 4. In Proceedings of the Fourth Text Retrieval Conference (TREC-4) (pp. 25-48).
[15] Brini, A. H. (2005). Un modèle de recherche d'information basé sur les réseaux possibilistes (Doctoral dissertation, Toulouse 3).
[16] Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM (JACM), 7(3), 216-244.
[17] Agrawal, R., Imieliński, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In ACM SIGMOD Record (Vol. 22, No. 2, pp. 207-216). ACM.
[18] Tebri H. Formalisation et spécification d’un système de filtrage incrémental d’information. Thèse de doctorat de l’université Paul Sabatier, Toulouse, 2004.
[19] V.Rijsbergen C. J. Information Retrieval. Department of Computing Science University of Glasgow.
[20] Iadh O. Un modèle d'indexation relationnel pour les graphes conceptuels fondé sur une interprétation logique, Thèse pour obtenir le grade de Docteur de l'Université Joseph Fourier, 1992.
[21] Piwowarski B, Denoyer L, Gallinari P. Un modèle pour la recherche d’information sur des documents structurés. 6es Journées internationales d’Analyse statistique des Données Textuelles. LIP6, PARIS – France, 2002.
[22] Denos N. Modélisation de la pertinence en recherche d'information : modèle conceptuel, formalisation et application. Thèse pour obtenir le grade de Docteur de l'Université Joseph Fourier-Grenoble I, 1997.
[23] http://www.comp.lancs.ac.uk/computing/research/stemming/Links/lovin s.htm
[24] http://www.comp.lancs.ac.uk/computing/research/stemming/Links/paice .htm
[25] http://tartarus.org/martin/PorterStemmer/
[26] http://snowball.tartarus.org/
[27] Guarino, N., Masolo, C., & Vetere, G. (1999). Ontoseek: Content-based access to the web. Intelligent Systems and their Applications, IEEE, 14(3), 70-80.
[28] Fabien GANDON, « Ontologie Engineering : a Survey and a Return on Experience », rapport de recherche INRIA, (Mars 2002).
[29] Bachimont, B. (2000). Engagement sémantique et engagement ontologique: conception et réalisation d’ontologies en ingénierie des connaissances.Ingénierie des connaissances: évolutions récentes et nouveaux défis, 305-323.
[30] Julio Gonzalo, Felisa Verdejo, Irina Chugur, and Juan Cigarran. Indexing with wordnet synsets can improve text retrieval. In Proceedings of the COLING/ACL '98 Workshop on Usage of WordNet for NLP, pages 38-44, Montreal, Canada, (1998).
[31] http://wordnet.princeton.edu/wordnet/download/