A Text Mining Technique Using Association Rules Extraction
Authors: Hany Mahgoub, Dietmar Rösner, Nabil Ismail, Fawzy Torkey
Abstract:
This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.
Keywords: Text mining, data mining, association rule mining
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1058536
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4453References:
[1] B. Lent, R. Agrawal, and R. Srikant, "Discovering trends in text Databases," KDD-97, 1997, pp.227-230.
[2] C. Manning and H Sch├╝tze, Foundations of statistical natural language processing (MIT Press, Cambridge, MA, 1999).
[3] G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan, "Scalable browsing for large collections: a case study," 5th Conf. digital Libraries, Texas, 2000, 215-218.
[4] H. Ahonen, O. Heinonen, M. klemettinen, and A. Inkeri Verkamo, "Mining in the phrasal frontier," in Proc. PKDD-97.1st European Symposium on Principle of data Mining and Knowledge Discovery, Norway, June, Trondheim, 1997.
[5] H. Ahonen, O. Heinonen, M. Klemettinen, and A. Inkeri Verkamo, "Applying data mining technique for descriptive phrase extraction in digital document collections," in Proc. of IEEE Forum on Research and technology Advances in Digital Libraries, Santa Barbra CA, 1998.
[6] H. Karanikas and B. Theodoulidis, "Knowledge discovery in text and text mining software," Technical Report, UMIST Departement of Computation, January 2002.
[7] H. Mahgoub,"Mining association rules from unstructured documents" in Proc. 3rd Int. Conf. on Knowledge Mining, ICKM, Prague, Czech Republic, Aug. 25-27, 2006, pp. 167-172.
[8] H. Mannila, H. Toivonen and A. I. Verkamo, "Discovery of frequent episodes in event sequences," Data Mining and Knowledge Discovery, 1(3), 1997b, pp. 259-289.
[9] J. Paralic and P. Bednar, "Text mining for documents annotation and ontology support (A book chapter in: "intelligent systems at service of Mankind," ISBN 3-935798-25-3, Ubooks, Germany, 2003).
[10] K. Norvag, T. Eriksen, and K. Skgstad, "Mining association rules in temporal document collections," Available: http://www.idi.ntnu.no/~noervaag/papers/ISMIS2006.pdf
[11] M. Rajman and R. Besancon, "Text mining: natural language techniques and text mining applications", in Proc. 7th working conf. on database semantics (DS-7), Chapan &Hall IFIP Proc. Series. Leysin, Switzerland Oct. 1997, 7-10.
[12] R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20th Int. conf. of very Large Data Bases, VLDB, Santigo, Chile, 1994, 487-499.
[13] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval (Addison-Wesley, Longman publishing company, 1999).
[14] R. Feldman and I. Dagan, "Knowledge discovery in textual databases (KDT)", in Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, 1995.
[15] R. Feldman and H. Hirsh, "Mining associations in text in the presence of background knowledge," in Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, USA, 1996.
[16] R. Feldman and M. Fresko, Y. Kinar, Y Lindell, O. Liphstat, M. Rajman, Y. Schler, O. Zamir, "Text mining at the term level," in Proc. 2nd European symposium on Principles of Data Mining and Knowledge Discovery (PKDD-98), Vol. 1510, Nantes pp 65-73.
[17] R. Feldman and M. Fresko, Y. Kinar, Y Lindell, O. Liphstat, M. Rajman, Y. Schler, O. Zamir, "Knowledge management: a text mining approach," in Proc. of th 2nd Int. Conf. on Practical Aspects of Knowledge Management (PAKM98), Basel, Switzerland, 29-30 Oct. 1998.
[18] X. Chen and Y. Wu, "Personalized knowledge discovery: mining novel association rules from text" Available: www.siam.org/meetings/sdm06/proceedings/067chenx.pdf
[19] Y. Kodratoff, "Knowledge discovery in texts: a definition, and applications," in Proc. of th 2nd Int., symposium, ISMS-99, Vol. 1609 of LNAI, Warsaw, Pol. Springer, Berlin Heidelberg New York, pp 16-29.
[20] Y. Liu, S. Navathe, A. Pivoshenko, A. Dasigi, R. Dingledine, B. Ciliax, "Text analysis of Medline for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes," Int. J. Data Mining and Bioinformatics, Vol. 1, No 1, 2006.