ORank: An Ontology Based System for Ranking Documents

Mehrnoush Shamsfard; Azadeh Nematzadeh; Sarah Motiee

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33156

ORank: An Ontology Based System for Ranking Documents

Authors: Mehrnoush Shamsfard, Azadeh Nematzadeh, Sarah Motiee

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques for extracting phrases and stemming words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1075408

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1900

References:

[1] E. Greengrass, "Information Retrieval: A survey". DOD Technical Report TR-R52-008-001, November 2000.
[2] G. Salton, E. A.Fox, H. Wu, "Extended boolean information retrieval", Communications of the ACM, Volume 26, No. 11, 1983, Pages: 1022 - 1036.
[3] J.H. Lee, "Properties of extended boolean models in information retrieval". Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994, Pages: 182 - 190.
[4] D. L. Lee, H. Chuang, K. Seamons, "Document ranking and the Vector- Space model". IEEE Software, Volume 14, Issue 2, March 1997, Pages: 67 - 75.
[5] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman, "Indexing by latent semantic analysis", Journal of the American Society for Information Science, Volume 41, Issue 6, 1990, Pages: 391- 407.
[6] M. E. Maron, J. L. Kuhns, "On relevance, probabilistic indexing and retrieval".Journal of the ACM, Volume 7, 1960, Pages: 216 - 244.
[7] F. Crestani, M. Lalmas, J. van Rijsbergen, L. Campbell, "Is this document relevant? ...probably. A survey of probabilistic models in information retrieval". ACM Computing Surveys, Volume 30, Issue 4, December 1998, Pages: 528 - 552.
[8] W.M Shaw, "Term-Relevance computations and perfect retrieval performance". Information Processing& Management, Volume 31, No. 4, 1995, Pages: 491 - 498.
[9] G. Amati, S. Kerpedjiev, "An information retrieval logical model: implementation and experiments". Technical Report Rel 5B04892, Fondazione Ugo Bordoni, Roma, Italy, March 1992.
[10] H. Turtle, W.B. Croft, "Evaluation of an inference network-based retrieval model". ACM Transactions on Information Systems, Volume 9, No. 3, 1991.
[11] M. R. Henzinger, "Hyperlink analysis for the web". IEEE Internet Computing, Volume 5, Issue 1, January 2001, Pages: 45 - 50.
[12] S. Brin, L. Page, "The anatomy of a Large-Scale Hyper-textual web search engine". Proceedings of the Seventh International World Wide Web Conference, Elsevier Science, New York, 1998, Pages: 107 - 117.
[13] R. Baeza-Yates, E. Davis, "Web page ranking using link attributes". International World Wide Web Conference, Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, New York, NY, USA, 2004, Pages: 328 - 329.
[14] R. Lempel, S. Moran. "The stochastic approach for link-structure analysis (SALSA) and the TKC e®ect". In The Ninth International WWW Conference, May 2000.
[15] H. Zhuge, L. Zheng, "Ranking Semantic-Linked network". WWW (Posters), 2003.
[16] D. Vallet, M. Fern├índez, P. Castells, "An Ontology-Based information retrieval model". 2nd European Semantic Web Conference (ESWC 2005). Heraklion, Greece, May 2005. Springer Verlag Lecture Notes in Computer Science, Volume 3532. G├│mez- Pérez,A.;Euzenat,J.(Eds.),2005, Pages:455-470.
[17] C. Rocha, D. Schwabe, M. Poggi de Arag├úo, "A hybrid approach for searching in the semantic web". International World Wide Web Conference, Proceedings of the 13th international conference on World Wide Web, 2004, Pages: 374 - 383.
[18] D.A. Grossman, O. Frieder. "Information retrieval algorithms and heuristics". Second ed. . Springer. 2004.
[19] R. Rada, H. Mili, E. Bicknell, M Blettner, "Development and application of a metric on semantic nets". IEEE Transactions on System, man, and Cybernetics, Volume 19, No. 1, Pages: 17 - 30.
[20] Y.W. Kim, J.H. Kim, "A model of knowledge based information retrieval with hierarchical concept graph". Journal of Documentation, Volume 46, No. 2, 1998, Pages: 113 - 136.
[21] M. Nakashima,Y. Kaneko, T. Ito, "Ranking of documents by measures considering conceptual dependence between terms". Systems and Computers in Japan, Volume 34, Issue 5, 2003, Pages: 81 - 91.
[22] J.M. Ponte, W.B. Croft, "A language modeling approach to information retrieval". In Proceedings of the 21st ACM SIGIR Conf. on Research and Development in Information Retrieval, Pages: 275 - 281.
[23] W. A. Woods, L. A. Bookman, A. Houston, R. J. Kuhns, P. Martin, S. Green, "Linguistic knowledge can improve information retrieval". Applied Natural Language Conferences, Proceedings of the Sixth Conference on Applied Natural Language Processing, 2000, Pages: 262 - 267.
[24] H. Rode, D. Hiemstra, "Conceptual language models for Context-Aware text retrieval". Proceedings of the 13th Text Retrieval Conference (TREC), NIST Special Publications, 2005.
[25] R. Belew, "Adaptive information retrieval". In Proceeding of the Twelfth Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, 1989, Pages: 11 - 20.
[26] H.Chen, "Machine learning for IR: Neural networks, symbolic learning, and genetic algorithims". Journal of the American Society for Information Science, Volume 46, No. 3, Pages: 194 - 216.
[27] Rocchio, "The SMART retrieval system experiments in automatic document processing". Relevance Feedback in Information Retrieval, Prentice Hall, 1971, Pages: 313 - 323.
[28] Aeroswarm,http://ubot.lockheedmartin.com/ubot/hotdaml/aeroswarm.ht ml
[29] LCNetTools,http://itlang/vb.net/archivio.asp?subMenu=Tutte&FullText on&TypeRi=AND&keyword=LCNettools