A Web Text Mining Flexible Architecture
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32794
A Web Text Mining Flexible Architecture

Authors: M. Castellano, G. Mastronardi, A. Aprile, G. Tarricone

Abstract:

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.

Keywords: Web text mining, flexible architecture, knowledgediscovery.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062466

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2609

References:


[1] M. Castellano, G. Mastronardi, A. Aprile, G. Bellone de Grecis, F. Fiorino, "Applying a Flexible Mining Architecture to Intrusion detection", ARES 2007, Second International workshop Data Warehousing and Data Mining, DAWAM 2007, Vienna, April, 2007.
[2] M. Castellano, N. Pastore, F. Arcieri, V. Summo, and G. Bellone de Grecis, "A Knowledge Center for a Social and Economic Growth of the Territory", IEEE Computer Society Press, International Conference On System Sciences, Big Island Hawaii, 3-6 January 2005.
[3] M.Castellano, N. Pastore, F. Arcieri, V. Summo, and G. Bellone de Grecis, "An e-Government Cooperative Framework for Government Agencies", IEEE Computer Society Press, International Conference On System Sciences, Big Island Hawaii, 3-6 January 2005.
[4] M.Castellano, N.Pastore, F.Arcieri, V. Summo, and G. Bellone de Grecis, "A Flexible Mining Architecture for Providing New EKnowledge Services", IEEE Computer Society Press, International Conference On System Sciences, Big Island Hawaii, 3-6 January 2005.
[5] M. Castellano, N. Pastore, F. Arcieri, V. Summo, and G. Bellone de Grecis, "Orchestrating Knowledge Discovery Process", E-Service Intelligence: Methodologies, Technologies and Application, Springer, pp 447-496.
[6] M. Castellano, F. Fiorino, F. Arcieri, V. Summo, and G. Bellone de Grecis, "A Web Mining Process for e-Knowledge Service", E-Service Intelligence: Methodologies, Technologies and Application, Springer, pp 447-496. A Web Mining.
[7] W. Lee, SJ. Stolfo, KW. Mok, "Data Mining Approaches for Intrusion Detection", Proceeding of the 7th USENIX Security Symposium, 1998.
[8] W. Zhong, X. Tang, "Web Text Mining on XSSC" Institute of System Science, Academy of Mathematics and System Science.
[9] Knowledge Discovery for Text, RGU: school of Computing, California.
[10] A.H. Tan, Text Mining: The State of the Art and the Challenges, in PAKDD99 Whorkshop on Knowledge Discovery from advanced Databases, Beijing, China, April 1999.
[11] Nahm U.Y. e Mooney R.J., Using Information Extraction to Aid the Discovery of Prediction Rules from Text, in KDD2000 Workshop on Text Mining, Boston, Massachusetts, USA, August 2000.
[12] B. Mobasher, R. Cooley, and J. Srivastava: Creating Adaptive Web Sites Through Usage-Based Clustering of URLs(1999), In Proc. of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX'99), November 1999.
[13] R. Kimball and R. Merz: "The Data Webhouse Toolkit, Building the Web-Enabled Data Warehouse", John Wiley & Sons, January 2000.
[14] Cooley, R. et al, "Web Mining: Information and Pattern Discovery on the World Wide Web", In Proceeding of IEEE International Conference Tools with AI. Newport Beach, California, USA, pp. 558-567, (1997).
[15] Etzioni, O., "The World Wide Web: Quagmire or GoldMine", Communication of the ACM, Vol. 39, No. 11, pp. 65-68, (1996).
[16] Chakrabarti, S. et al, Focused Crawling, "A New Approach to Topic- Specific Web Resource Discovery", In Proceeding on the 8th International Word Wide Web Conference,. Toronto, Canada, pp. 1623- 1640, (1999).
[17] A. Hotho, A. Numberger, G. Paab, A brief Survey of Text Mining, University of Kassel, School of Computer Science, Knowledge Discovery Group, 13 May, 2005.
[18] GATE - General Architetcture for Text Engineering, http://gate.ac.uk/
[19] Saurav S. Bhowmick, Wee Keong Mg, Sanjay Madria, "Web Schemas in WHOWEDA", Data Warehousing and OLAP. McLean, Virginia, United States. Year 2000, pp. 17 - 24, ISBN:1-58113-323-5.
[20] Service Oriented Architecture, SOA, White Paper.
[21] Brin, S. and Page, L., "The Anatomy of a Large Scale Hypertextual Web Search Engine", In Proceeding of the 7th International World Wide Web Conference, Brisbane, Australia, pp. 107-117, (1998).
[22] MARINHO, Leandro Balby Marinho, Girardi Rosario: "Mineração da Web", Revista Eletrônica de Iniciação Cientfica, São Luiz, Jun. 2003.
[23] Adriana Marotta, Regina Motz, Raul Ruggia, "Managing Source Schema Evolution in Web Warehouses", Instituto de Computación, Facultad de Ingeniería Universidad de la República. Montevideo, Uruguay, 2001.
[24] O.Etzioni, "The world wide web: Quagmire or gold mine", Comm.of the ACM,39(11):6568,1996.