Bottom Up Text Mining through Hierarchical Document Representation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33090
Bottom Up Text Mining through Hierarchical Document Representation

Authors: Y. Djouadi., F. Souam.

Abstract:

Most of the existing text mining approaches are proposed, keeping in mind, transaction databases model. Thus, the mined dataset is structured using just one concept: the “transaction", whereas the whole dataset is modeled using the “set" abstract type. In such cases, the structure of the whole dataset and the relationships among the transactions themselves are not modeled and consequently, not considered in the mining process. We believe that taking into account structure properties of hierarchically structured information (e.g. textual document, etc ...) in the mining process, can leads to best results. For this purpose, an hierarchical associations rule mining approach for textual documents is proposed in this paper and the classical set-oriented mining approach is reconsidered profits to a Direct Acyclic Graph (DAG) oriented approach. Natural languages processing techniques are used in order to obtain the DAG structure. Based on this graph model, an hierarchical bottom up algorithm is proposed. The main idea is that each node is mined with its parent node.

Keywords: Graph based association rules mining, Hierarchical document structure, Text mining.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062746

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2056

References:


[1] R. Agrawal, S. Rajagopalan, R. Srikant, Y. Xu, "Mining Newsgroups Using Networks Arising From Social Behavior", Proceedings of the Twelfth Int-l World Wide Web Conference, Budapest, Hungary, May 2003.
[2] R. Agrawal, T. Imielinski, A. Swami, "Mining associations rules between sets of items in large databases", In Proc of the ACM SIGMOD Conference on Management of Data, Washington, D.C., 1993, pp. 207- 216.
[3] F. Berzal, J.C. Cubero, N. Marin, J.M. Serrano, "TBAR: An efficient Method for Association Rule Mining in Relational Databases", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, vol 6, 2002.
[4] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery, "Learning to Construct Knowledge Bases from the World Wide Web", Artificial Intelligence Review, vol 118, 2000, pp 69- 114.
[5] R. Dale, "Exploring the Role of Punctuation in the Signalling of Discourse Structure", Workshop on Text Representation and Domain Modelling, Berlin, 1991 pp. 110-120.
[6] R. Feldman "Mining Text Data", Chapter 21 in Handbook of Data Mining, Lawrence Erlbaum Associates, 2003, 48 pages.
[7] R. Feldman, H. Hirsh, "Finding Associations in Collections of Text", Machine Learning, Data mining and Knowledge Discovery: Methods and Application, In R.S. Michalski, I. Bratko, and M. Kubat editor, John Wiley and Sons Ltd, 1997.
[8] M.H. Haddad, J.P. Chevallet, M.F. Bruandet, "Relations between Terms Discovered by Association Rules", European Conference on principles and Practices of Knowledge Discovery in Databases, PKDD-2000, Lyon, France, September 2000.
[9] Hearst. "Untangling Text Data Mining". Proceedings of ACL-99, 37th Annual Meeting of the Association for Computational Linguistics, 1999.
[10] M. Montes-Y-Gomez, A. Gelbukh, A. Lopez-Lopez, R. Baeza-Yates, "Text Mining with Conceptual Graphs", Symposium of Natural Languages Processing and Knowledge Engineering, NLPKE-2001, IEEE, Tucson, USA, October 2001.
[11] M. Rajman, R. Besançon, "Text Mining - Knowledge extraction from unstructured textual data", Proc. of 6th Conference of International Federation of Classification Societies (IFCS-98), Roma (Italy), July 98, pp 473-480.
[12] S. Ray, M. Craven, "Representing Sentence Structure in Hidden Markov Models for Information Extraction", Proceedings of the 17th International Joint Conference on Artificial Intelligence. IJCAI 2001.
[13] F. Souam, "Transactions Expansion for Mining Hierarchical textual Documents". Master Thesis, University of Tizi-Ouzou, Algeria, to appear in 2006.
[14] E. Pascual, J. Virbel, "Semantic and Layout Properties of Text Punctuation", Workshop on Punctuation in Computational Linguistics, ACL-96, USA, June 1996.
[15] R. Srikant, R. Agrawal, "Mining Generalized Associations Rules", Proceedings of the 21st VLDB Conference, Zurich, Switzerland, 1995.