Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Deep Web Content Mining

Authors: Shohreh Ajoudanian, Mohammad Davarpanah Jazi

Abstract:

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.

Keywords: Content mining, complex matching, correlation mining, information extraction.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1075729

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1927

References:


[1] Bin He, Kevin chen-chuan chang; "Automatic complex schema matching across web query interfaces: A correlation mining approach"; ACM Transactions on Databases Systems; Vol. 31; No.1; Pages 1-45; March 2006.
[2] Michael K. Bergman; "The Deep Web: Surfacing Hidden Value"; www.BrightPlanet.com; Pages 1-5; 2001.
[3] Kevin chen-chuan chang; "Toward Large Scale Integration: Building a Metaquerier over databases on the web"; VLDB Journal; 2005.
[4] Zhen Zhang; "Light-weight Domain-based Form Assistant: Querying web databases on the fly "; 31st VLDB Conference; Trondheim Norway; 2005.
[5] M. A. Hearst and J. O. Pederson; "Reexamining the cluster hypothesis: Scatter/gather on retrieval results"; In Proceedings of SIGIR; Pages 76- 84; 1996.
[6] O. Zamir and O. Etzioni; "Web document clustering: a feasibility demonstration"; In Proceedings of SIGIR; 1998.
[7] Sh. Ajoudanian, M. Davarpanah Jazi, and M. Saraee; "Discovering Knowledge from Deep Web Databases using Correlation Mining Approach"; IDMC Conference; Iran; 2007.
[8] Bin He, Kevin chen-chuan chang; "Statistical schema matching across web query interfaces"; In SIGMOD Conferences; 2003.
[9] E. Rahm, P. A. Bernstein;"A survey of approaches to automatic schema matching"; VLDB Journal; no 10; Pages 334-350; 2001.
[10] Agrawal R., Imielinski T., Swami A. N.; "Mining association rules between sets of items in large databases"; In SIGMOD Conference; 1993.
[11] Y-K Lee, W-Y Kim, Y. D. Cai; "Efficient mining of correlated patterns"; In SIGMOD Conference; 2003.
[12] S. Brin, R. Motwani, C. Silverstein; "Beyond market baskets: generalizing association rules to correlations"; In SIGMOD Conference; 1997.