Revised PLWAP Tree with Non-frequent Items for Mining Sequential Pattern
Authors: R. Vishnu Priya, A. Vadivel
Abstract:
Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.
Keywords: Sequential pattern mining, weblog, frequent and non-frequent items, incremental and interactive mining.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1073589
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1931References:
[1] R. Agrawal and R. Srikant, "Mining sequential patterns," In: Proceedings of the 11th Int-l conference on data engineering, Taipei, 1995, pp 3-14.
[2] H. Cheung, X. Yan and J. Han, "IncSpan: incremental mining of sequential patterns," In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, 2004, pp. 527-532.
[3] C.I. Ezeife, Yi Lu and Yi Liu, "PLWAP sequential mining: open source code," In: Proceedings of the open source data mining workshop on frequent pattern mining implementations, in conjunction with ACM SIGKDD, Chicago, August 21-24, 2005, pp 26-29.
[4] C.I. Ezeife and Yi Liu, "Fast incremental mining of web sequential patterns with PLWAP tree," Int J Data Mining Knowledge Discovery, Springer Science Publisher, vol. 19, 2009, pp 376-416.
[5] B. Kao, M. Zhang, C-LYi and D.W Cheung, "Efficient algorithms for mining and incremental update of maximal frequent sequences," Int J Data Mining Knowledge Discovery, Springer Science Publisher, vol. 10, 2005, pp 87-116.
[6] F. Masseglia, P. Poncelet and R. Cicchetti, "An efficient algorithm for web usage mining," Netw Inform Syst Journal, vol. 2(5-6), 1999, pp 571-603.
[7] A. Nanopoulos and Y. Manolopoulos, "Mining patterns from graph traversals," Data Knowledge Engineering, vol. 37(3), 2001, pp 243-266.
[8] S. Nguyen, X. Sun and M. Orlowska, "Improvements of incSpan: incremental mining of sequential patterns in large database," In: Proceedings 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD-05), 2005, pp 442-451.
[9] S. Parthasarathy, M.J Zaki, M. Ogihara and S. Dwarkadas, "Incremental and interactive sequence mining," In: Proceedings of the 8th international conference on information and knowledge management (CIKM99), Kansas City, pp 251- 258.
[10] J. Pei, J. Han, B. Mortazavi-Asl and H. Pinto, "PrefixSpan: mining sequential patterns efficiently by prefix projected pattern growth. In: The proceedings of the 2001 international conference on data engineering (ICDE -01), pp 215-224.
[11] J. Pei, J. Han, B. Mortazavi-asl and H. Zhu, "Mining access patterns efficiently from web logs," In: proceedings 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD-00), 2000, Kyoto, pp 396-407.
[12] M. Spiliopoulou, "The laborious way from data mining to webmining," Journal Computer System Science Eng, Special Issue Semant Web ,vol.14, 1999, pp 113-126.
[13] R. Srikant and R. Agrawal, "Mining generalized association rules," In: Proceedings of the 21st int-l conference on very large databases (VLDB), Zurich,1995.
[14] R. Vishnu Priya, A.Vadivel and R.S. Thakur, "Frequent Pattern Mining Using Modified CP-Tree for Knowledge Discovery," In the proceedings of international conference ADMA-10, Part I, LNCS 6440, 2010, pp. 254-261.
[15] K. Wang, "Discovering patterns from large and dynamic sequential data," J Intell Information System, vol. 9(1), 1997, pp 33-56
[16] M.J Zaki, "SPADE: an efficient algorithm for mining frequent sequences," Machine Learning, vol.42, 2000, pp 31-60.
[17] M. Zhang, B. Kao, D. Cheung and C-L.Yip, "Efficient algorithms for incremental update of frequent sequences," In: Proceedings of the sixth Pacific-Asia conference on knowledge discovery and data mining (PAKDD), 2002, pp 186-197.