Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30320
A Keyword-Based Filtering Technique of Document-Centric XML using NFA Representation

Authors: Changwoo Byun, Kyounghan Lee, Seog Park

Abstract:

XML is becoming a de facto standard for online data exchange. Existing XML filtering techniques based on a publish/subscribe model are focused on the highly structured data marked up with XML tags. These techniques are efficient in filtering the documents of data-centric XML but are not effective in filtering the element contents of the document-centric XML. In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to adequately filter element contents using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. We show several performance studies, efficiency and scalability using the multi-query processing time (MQPT).

Keywords: XML Data Stream, Document-centric XML, Filtering Technique, Value-based Predicates

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1079240

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1413

References:


[1] T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0 Second Edition W3C Recommendation. Technical Report REC-xml-200010006, World Wide Web Consortium.
[2] J. Kamps, M. Marx, M. de Rijke, and B. Sigurbjörnsson, "Best-match Query form Document-centric XML," In Proc. Int. Workshop on the Web and Databases, pp. 55-60, 2004.
[3] J. Clark, and S. DeRose. XML Path Language (XPath) Version 1.0 W3C Recommendation. Technical Report REC-xpath-19991116, World Wide Web Consortium.
[4] S. Boag, D. Chamberlin, M. F. Fern├índez, D. Florescu, J. Robie, and J. Siméon. XQuery 1.0: An XML Query Language W3C Working Draft. Technical Report WD-xquery-20050404, World Wide Web Consortium.
[5] A. V. Aho and M. J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search," Communications of the ACM, Vol. 18, Issue 6, pp. 333-340, 1975.
[6] Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer, "Path Sharing and Predicate Evaluation for High-Performance XML Filtering," ACM Trans. Database Systems, Vol. 28, Issue 4, pp. 467-516, 2003.
[7] T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu, "Processing XML Streams with Deterministic Automata and Stream Indexes," ACM Trans. Databases Systems, Vol. 29, Issue 4, pp. 752-788, 2004.
[8] N. Bruno, L. Gravano, N. Koudas, and D. Srivastava, "Navigation- vs. Index-based XML Multi-query Processing," In Proc. IEEE Int. Conf. Data Engineering, pp. 139-150, 2003.
[9] C. Chan, P. Felber, M. Garofalakis, and R. Rastogi, "Efficient Filtering of XML Documents with XPath Expressions," In Proc. IEEE Int. Conf. Data Engineering, pp. 235, 2002.
[10] V. Josifovski, M. Fontoura, and A. Barta, "Querying XML Streams," Int. J. Very Large Data Bases, Vol. 14, Issue 2, pp. 197-210, 2005.
[11] J. Kwon, P. Rao, B. Moon, and S. Lee, "FiST: Scalable XML Document Filtering by Sequencing Twig Patterns," In Pro. Int. Conf. Very Large Data Bases, pp. 294-315, 2005.
[12] A. K. Gupta and D. Suciu, "Stream Processing of XPath Queries with Predicates," In Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 419-430, 2003.
[13] F. Tian, B. Reinwald, H. Pirahesh, T. Mayr, and J. Myllymaki, "Implementing A Scalable XML Publish/Subscribe System Using Relational Database Systems," In Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 479-490, 2004.
[14] F. Peng, and S. S. Chawathe, "XSQ: A Streaming XPath Engine," ACM Trans. Databases Systems, Vol. 30, Issue 2, pp. 577-623, 2005.
[15] D. Megginson. SAX: A Free API for Event-based XML Parsing. Available: http://www.saxproject.org, 2005.
[16] C. D. Manning and H. Sch├╝tze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.