Mining Association Rules from Unstructured Documents

Hany Mahgoub

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33156

Mining Association Rules from Unstructured Documents

Authors: Hany Mahgoub

Abstract:

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.

Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1330457

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2450

References:

[1] B. Lent, R. Agrawal, and R. Srikant, "Discovering trends in text Databases," KDD-97, 1997, pp.227-230.
[2] C. Manning and H Sch├╝tze, Foundations of statistical natural language processing (MIT Press, Cambridge, MA, 1999).
[3] D. Rösner and M. Kunze, "The XDOC Document Suite -- A Workbench for Document Mining," In Text Mining - Theoretical Aspects and Applications, Advances in Soft Computing, Physica - Verlag, 2003, 113- 130.
[4] G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan, "Scalable browsing for large collections: a case study," 5th Conf. digital Libraries, Texas, 2000, 215-218.
[5] H. Ahonen, O. Heinonen, M. klemettinen, and A. Inkeri Verkamo, "Mining in the phrasal frontier," Proc. PKDD-97.1st European Symposium on Principle of data Mining and Knowledge Discovery, Norway, June, Trondheim, 1997.
[6] H. Ahonen, O. Heinonen, M. Klemettinen, and A. Inkeri Verkamo, "Applying data mining technique for descriptive phrase extraction in digital document collections,"Proc. of IEEE Forum on Research and technology Advances in Digital Libraries, Santa Barbra CA, 1998, 2-11.
[7] H. Karanikas and B. Theodoulidis, "Knowledge discovery in text and text mining software," Technical Report, UMIST Departement of Computation, January 2002.
[8] H. Mannila, H. Toivonen and A. I. Verkamo, "Discovery of frequent episodes in event sequences," Data Mining and Knowledge Discovery, 1(3), 1997b, pp. 259-289.
[9] J. Paralic and P. Bednar, "Text mining for documents annotation and ontology support (A book chapter in: "intelligent systems at service of Mankind," ISBN 3-935798-25-3, Ubooks, Germany, 2003).
[10] M. Rajman and R. Besancon, Text mining: natural language techniques and text mining applications. Proc. 7th working conf. on database semantics (DS-7), Chapan &Hall IFIP Proc. Series. Leysin, Switzerland Oct. 1997, 7-10.
[11] R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20th Int. conf. of very Large Data Bases, VLDB, Santigo, Chile, 1994, 487-499.
[12] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval (Addison-Wesley, Longman publishing company, 1999).
[13] R. Feldman and I. Dagan, Knowledge discovery in textual databases (KDT), Proc. 1st nt. Conf. on Knowledge Discovery and Data Mining, 1995.
[14] R. Feldman and H. Hirsh, "Mining associations in text in the presence of background knowledge," Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, USA, 1996.
[15] S. Brin, R. Motwani, and C. Silverstein, "Beyond market baskets: generalizing association rules to dependence rules," KDD-98, 1998, 39- 68.