A Hybrid Ontology Based Approach for Ranking Documents
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33104
A Hybrid Ontology Based Approach for Ranking Documents

Authors: Sarah Motiee, Azadeh Nematzadeh, Mehrnoush Shamsfard

Abstract:

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060048

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1630

References:


[1] E. Greengrass, "Information Retrieval: A survey". DOD Technical Report TR-R52-008-001, November 2000.
[2] G. Salton, E. A.Fox, H. Wu, "Extended boolean information retrieval", Communications of the ACM, Volume 26, No. 11, 1983, Pages: 1022- 1036.
[3] J.H. Lee, "Properties of extended boolean models in information retrieval". Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994, Pages: 182-190.
[4] D. L. Lee, H. Chuang, K. Seamons, "Document ranking and the Vector- Space model". IEEE Software, Volume 14, Issue 2, March 1997, Pages: 67 - 75.
[5] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman, "Indexing by latent semantic analysis", Journal of the American Society for Information Science, Volume 41, Issue 6, 1990, Pages: 391-407.
[6] M. E. Maron, J. L. Kuhns, "On relevance, probabilistic indexing and retrieval".Journal of the ACM, Volume 7, 1960, Pages: 216-244.
[7] F. Crestani, M. Lalmas, J. van Rijsbergen, L. Campbell, "Is this document relevant? ...probably. A survey of probabilistic models in information retrieval". ACM Computing Surveys, Volume 30, Issue 4, December 1998, Pages: 528 - 552.
[8] M. R. Henzinger, "Hyperlink analysis for the web". IEEE Internet Computing, Volume 5, Issue 1, January 2001, Pages: 45 - 50.
[9] S. Brin, L. Page, "The anatomy of a Large-Scale Hyper-textual web search engine". Proceedings of the Seventh International World Wide Web Conference, Elsevier Science, New York, 1998, Pages: 107-117.
[10] R. Baeza-Yates, E. Davis, "Web page ranking using link attributes". International World Wide Web Conference, Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, New York, NY, USA, 2004, Pages: 328 - 329.
[11] R. Lempel, S. Moran. "The stochastic approach for link-structure analysis (SALSA) and the TKC e®ect". In The Ninth International WWW Conference, May 2000.
[12] D. Vallet, M. Fern├índez, P. Castells, "An Ontology-Based information retrieval model". 2nd European Semantic Web Conference (ESWC 2005). Heraklion, Greece, May 2005. Springer Verlag Lecture Notes in Computer Science, Volume 3532. G├│mez- Pérez,A.;Euzenat,J.(Eds.),2005, Pages:455-470.
[13] M. Nakashima,Y. Kaneko, T. Ito, "Ranking of documents by measures considering conceptual dependence between terms". Systems and Computers in Japan, Volume 34, Issue 5 , 2003, Pages: 81 - 91.
[14] C. Rocha, D. Schwabe, M. Poggi de Aragão, "A hybrid approach for searching in the semantic web". International World Wide Web Conference, Proceedings of the 13th international conference on World Wide Web, 2004, Pages: 374 - 383.
[15] Aeroswarm,http://ubot.lockheedmartin.com/ubot/hotdaml/aeroswarm.ht ml
[16] LCNetTools,http://itlang/vb.net/archivio.asp?subMenu=Tutte&FullText on&TypeRi=AND&keyword=LCNettools