Data Extraction of XML Files using Searching and Indexing Techniques
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
Data Extraction of XML Files using Searching and Indexing Techniques

Authors: Sushma Satpute, Vaishali Katkar, Nilesh Sahare

Abstract:

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.

Keywords: XML Retrieval, Indexed Search, Information Retrieval.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1055992

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1734

References:


[1] M. F. Porter, "An algorithm for suffix stripping," Program, vol. 14, no. 3, pp. 130-137, 1980.
[2] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, "The lorel query language for semistructured data," JODL, vol. 1, no. 1, pp. 68-88, April 1997.
[3] S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine," Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107-117, 1998.
[Online]. Available: citeseer.ist.psu.edu/ brin98anatomy.html
[4] D. S. J. Robie, J. Lapp, "Xml query language (xql)," QL-98 The Query Languages Workshop, 1998, www.w3.org/TandS/QL/QL98/pp/xql.html.
[5] Xml path language (xpath) version 1.0," Tech. Rep., November 1999, http://www.w3.org/TR/xpath.
[6] A. Bonifati and S. Ceri, "Comparative analysis of five XML query languages," SIGMOD Record, vol. 29, no. 1, pp. 68-79, 2000.
[Online]. Available: citeseer.ist.psu.edu/article/bonifati00comparative.html
[7] N. Fuhr and K. Grosjohann, "XIRQL: A query language for information retrieval in XML documents," in Research and Development in Information Retrieval, 2001, pp. 172-180. (Online). Available: citeseer.ist.psu.edu/fuhr01xirql.html
[8] A. Theobald and G. Weikum, "The index-based xxl search engine for querying xml data with relevance ranking," in EDBT -02: Proceedings of the 8th International Conference on Extending Database Technology. London, UK: Springer-Verlag, 2002, pp. 477-495.
[9] D. Carmel, Y. Maarek, Y. Mass, N. Efraty, and G. Landau, "An Extension of the Vector Space Model for Querying XML documents via XML fragments," in ACM SIGIR 2002 Workshop on XML and Information Retrieval, Tampere, Finland, august 2002.
[10] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, "XSearch : A Semantic Search Engine for XML ," in 29th VLDB Conference, berlin, Germany, 2003, http://www.vldb.org/conf/2003/papers/S03P02.pdf.
[11] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, "Xrank: Ranked keyword search over xml documents," 2003. (Online). Available: citeseer.ist.psu.edu/guo03xrank.html
[12] H. Meyer, I. Bruder, G. Weber, and A. Heuer, "The xircus search engine," 2003. (Online). Available: citeseer.ist.psu.edu/meyer03xircus. Html
[13] P. Francq, "Collaborative and structured search: an integrated approach for sharing documents among users," Ph.D. dissertation, Universit'e libre de Bruxelles, June 2003.
[14] W. W. W. Consortium, "Xquery 1.0: an xml query language," Tech. Rep., November 2003, http://www.w3.org/TR/xquery.
[15] K. Sauvagnat and M. Boughanem, "XFIRM: A Flexible Information Retrieval Model for Indexing and Searching XML documents," in ECIR (European Conference on Information Retrieval)- Proceedings volume 2 (Poster Abstracts) , Sunderland, UK. - Edited by Michael P. Oakes,5-7 avril 2004, pp. 17-18.