Experiments on Element and Document Statistics for XML Retrieval

Mohamed Ben Aouicha; Mohamed Tmar; Mohand Boughanem; Mohamed Abid

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33132

Experiments on Element and Document Statistics for XML Retrieval

Authors: Mohamed Ben Aouicha, Mohamed Tmar, Mohand Boughanem, Mohamed Abid

Abstract:

This paper presents an information retrieval model on XML documents based on tree matching. Queries and documents are represented by extended trees. An extended tree is built starting from the original tree, with additional weighted virtual links between each node and its indirect descendants allowing to directly reach each descendant. Therefore only one level separates between each node and its indirect descendants. This allows to compare the user query and the document with flexibility and with respect to the structural constraints of the query. The content of each node is very important to decide weither a document element is relevant or not, thus the content should be taken into account in the retrieval process. We separate between the structure-based and the content-based retrieval processes. The content-based score of each node is commonly based on the well-known Tf × Idf criteria. In this paper, we compare between this criteria and another one we call Tf × Ief. The comparison is based on some experiments into a dataset provided by INEX1 to show the effectiveness of our approach on one hand and those of both weighting functions on the other.

Keywords: XML retrieval, INEX, Tf × Idf, Tf × Ief

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1070557

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1347

References:

[1] World wide web consortium (w3c). extensible markup language (xml) 1.0. http://www.w3.org/TR/REC-xml, 2000.
[2] Inex - initiative for the evaluation of xml retrieval. http://inex.is.informatik.uniduisburg.de, 2003.
[3] H. Blanken, R. Grabs, and G. Weikum. Intelligent search on xml. Springer-Verlag, 2003.
[4] D. Carmel, Y. Maarek, S. Mandelbrod, M. Mass, and A. Soffer. Searching xml documents via xml fragments. Proc. of the 24th annual ACM SIGIR conference on research and development in Information Retrieval, pages 151-158, 2003.
[5] N. Fuhr and K. Grossjohann. Xirql: A query language for information retrieval in xml documents. Proc. of the 24th annual ACM SIGIR conference on research and development in Information Retrieval, New Orlans, USA, pages 172-180, 2001.
[6] M. Fuller, E. Mackie, R. Sacks-Davis, and R. Wilkinson. Structural answers for a large structured document collection. Proc. of the 24th annual ACM SIGIR conference on research and development in Information Retrieval, Pittsburgh, USA, pages 204-213, 1993.
[7] G. B. G. and Pasi. Flexible querying of structured documents. Proc. of the fourth International Conference on Flexible Query Answering Systems(FQAS), 2000.
[8] T. Grust. Accelerating xpath location steps. Proc. of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, pages 109-120, 2002.
[9] J. Kamps, M. Marx, M. D. Rijke, and B. Sigurbjornsson. Xml retrieval : What to retrieve ? Proc. of the 24th annual ACM SIGIR conference on research and development in Information Retrieval, pages 409-410, 2003.
[10] G. Kazai, M. Lalmas, and T. Roelleke. A model for the representation and focused retrieval of structured documents based on fuzzy aggregation. Proc. of SPIRE2001, Chile, pages 123-135, 2001.
[11] M. Lalmas. Dempster-shafers theory of evidence applied to structured documents: Modeling uncertainty. Proc. of the 24th annual ACM SIGIR conference on research and development in Information Retrieval, Philadelphia, USA, pages 110-118, 1997.
[12] R. Luk, H. Leong, T. Dillon, A. Chan, W. Croft, and J. Allan. A survey in indexing and searching xml documents. Journal of the American Society for Information Science and Technology, 6(53), 2000.
[13] M. Marx, J. Kamps, and M. D. Rijka. The university of amsterdam at inex 2002. Proc. of the INEX 2002 Workshop, Germany, pages 23-28, 2002.
[14] A. Moffat, R. Sacks-Davis, R. Wilkinson, and J. Zobel. Retrieval of partial documents. Proc. of TREC-2, 1993.
[15] F. N., G. N., K. G., and L. M. Inex : Evaluation initiative for xml retrieval. Proc. of INEX 2002 Workshop, DELOS Workshop, 2003.
[16] T. Schlieder and H. Meuss. Querying and ranking xml documents. Journal of the American Society for Information Science and Technology, 6(53):489-503, 2002.
[17] S. Selkow. The tree-to-tree edition problem. Information processing letters, pages 184-186, 1977.
[18] R. Wilkinson. Effective retrieval of structured documents. Proc. of the 24th annual ACM SIGIR conference on research and development in Information Retrieval, Dublin, Ireland, pages 311-317, 1994.
[19] J. Wolff, H. Flrke, and A. Cremers. Searching and browsing collections of structural information. Proc. of IEEE advances in digital libraries, Washington, USA, pages 141-150, 2000.
[20] Y. Mass, M. Mandelbrod, E. Amitay, D. Carmel, Y. S. Maarek and A. Soffer. JuruXML an XML retrieval system at INEX02. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf, pages 73- 80, 2003.
[21] P. Ogilvie and J. Callan. Parameter estimation for a simple hierarchical generative model for XML retrieval. http://inex.is.informatik.uniduisburg. de:2005/proceedings.pdf, pages 211-224, 2005.
[22] XQuery: A query language for XML. http://www.w3.org/TR/xquery/, 2001.
[23] S. Amer-Yahia, B. Chavdar, J. Dorre and J. Shanmugasundaram. XQuery full-text extensions explained. IBM Systems Journal, pages 335-352, 2006.
[24] K. Sauvagnat and M. Boughanem. The impact of leaf nodes relevance values evaluation in a propagation method for XML retrieval. 3rd XML and Information Retrieval Workshop, SIGIR 2004, Sheffield, England, pages 19-22, 2004.