W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees

Authors: R. AliMohammadzadeh, M. Haghir Chehreghani, A. Zarnani, M. Rahgozar

Abstract:

Mining frequent tree patterns have many useful applications in XML mining, bioinformatics, network routing, etc. Most of the frequent subtree mining algorithms (i.e. FREQT, TreeMiner and CMTreeMiner) use anti-monotone property in the phase of candidate subtree generation. However, none of these algorithms have verified the correctness of this property in tree structured data. In this research it is shown that anti-monotonicity does not generally hold, when using weighed support in tree pattern discovery. As a result, tree mining algorithms that are based on this property would probably miss some of the valid frequent subtree patterns in a collection of trees. In this paper, we investigate the correctness of anti-monotone property for the problem of weighted frequent subtree mining. In addition we propose W3-Miner, a new algorithm for full extraction of frequent subtrees. The experimental results confirm that W3-Miner finds some frequent subtrees that the previously proposed algorithms are not able to discover.

Keywords: Semi-Structured Data Mining, Anti-Monotone Property, Trees.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080472

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1384

References:


[1] Y. Chi, S. Nijssen, R.R. Muntz, J. N. Kok, "Frequent Subtree Mining An Overview," Fundamental Informatics, Special Issue on Graph and Tree Mining, 2005.
[2] M.J. Zaki, "Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications," in IEEE Transaction on Knowledge and Data Engineering, vol. 17, no. 8, pp. 1021-1035, 2005.
[3] H. Tan, T.S. Dillon, L. Feng, E. Chang, F. Hadzic, "X3-Miner: Mining Patterns from XML Database," In Proc. Data Mining '05. Skiathos, Greece, 2005.
[4] K. Abe, S. Kawasoe, T. Asai, H. Arimura, and S. Arikawa, "Optimized Substructure Discovery for Semi-structured Data," In Proc. PKDD-02, 1-14, LNAI 2431, 2002.
[5] M. J Zaki,.. Efficient Mining of Trees in the Forest. SIGKDD '02, Edmonton, Alberta, Canada, ACM. 2002.
[6] M. J. Zaki and C. C. Aggarwal. XRules: An effective structural classifier for XML data. In Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining, 2003.
[7] T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequent substructures in large unordered trees. In Proc. of the 6th Intl. Conf. on Discovery Science, 2003.
[8] Y. Chi, Y. Yang, and R. R. Muntz. Mining frequent rooted trees and free trees using canonical forms. Technical Report CSD-TR No. 030043, UCLA, 2003.
[9] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo, "Fast Discovery of Association Rules," Advances in Knowledge Discovery, and Data Mining, U. Fayyad et al., eds.,pp. 307- 328, Menlo Park, Calif.: AAAI Press, 1996.
[10] M.J. Zaki, "Fast Vertical Mining Using Diffsets," In. Proc. of Int. Conf. Knowledge Discovery and Data Mining (SIGKDD-03), 2003.
[11] M. Zaki. Efficiently mining frequent embedded unordered trees. Fundamental Informatics, 65:1-20, 2005.
[12] B. Shapiro and K. Zhang, "Comparing Multiple RNA Secondary Structures Using Tree Comparisons," Computer Applications in Biosciences, vol. 6, no. 4, pp. 309-318, 1990.