An Efficient Approach to Mining Frequent Itemsets on Data Streams

Sara Ansari; Mohammad Hadi Sadreddini

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

An Efficient Approach to Mining Frequent Itemsets on Data Streams

Authors: Sara Ansari, Mohammad Hadi Sadreddini

Abstract:

The increasing importance of data stream arising in a wide range of advanced applications has led to the extensive study of mining frequent patterns. Mining data streams poses many new challenges amongst which are the one-scan nature, the unbounded memory requirement and the high arrival rate of data streams. In this paper, we propose a new approach for mining itemsets on data stream. Our approach SFIDS has been developed based on FIDS algorithm. The main attempts were to keep some advantages of the previous approach and resolve some of its drawbacks, and consequently to improve run time and memory consumption. Our approach has the following advantages: using a data structure similar to lattice for keeping frequent itemsets, separating regions from each other with deleting common nodes that results in a decrease in search space, memory consumption and run time; and Finally, considering CPU constraint, with increasing arrival rate of data that result in overloading system, SFIDS automatically detect this situation and discard some of unprocessing data. We guarantee that error of results is bounded to user pre-specified threshold, based on a probability technique. Final results show that SFIDS algorithm could attain about 50% run time improvement than FIDS approach.

Keywords: Data stream, frequent itemset, stream mining.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328918

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1373

References:

[1] C.Raissi, P. Poncelet, Teisseire, "Towards a new approach for mining frequent itemsets on data stream", J Intell Inf Syst , 2007, vol. 28, pp. 23-36, 2007.
[2] J,.Xu Yu , Z.Chong , H. Lu, Z. Zhang , A. Zhou b," A false negative approach to mining frequent itemsets from high speed transactional data streams", Information Sciences 176, 1986-2015,2006.
[3] G. Giannella, J. Han, J. Pei, X. Yan, P.Yu," Mining frequent patterns in data streams at multiple time granularities", In Next generation data mining. New York: MIT, 2003.
[4] H. Li, F. Lee, S.Y., M. Shan," An efficient algorithm for mining frequent itemsets over the entire history of data streams".,In Proceedings of the 1st InternationalWorkshop on Knowledge Discovery in Data streams,2004.
[5] R. Jin, G. Agrawal," An Algorithm for In-Core Frequent Itemset Mining on Streaming Data"
[6] P. Domingos , G. Hulten," Mining high-speed data streams",In Proceedings of the ACM Conference on Knowledge and Data Discovery (SIGKDD), 2000.
[7] A. Cheng, Y. Ke,W. Ng, "A survey on algorithms for mining frequent itemsets over data streams", Knowl Inf Syst, 2007.
[8] M. Charikar, K. Chen, M. Farach," Finding frequent items in data streams". Theor Comput Sci, vol. 312, pp.3-15, 2004.
[9] J.Cheng,Y. Ke,W. Ng," Maintaining frequent itemsets over high-speed data streams", In Proceedings of the 10th Pacific-asia Conference on knowledge discovery and data mining, Singapore, pp. 462-467, April 2006.
[10] R.Agrawal,T. Imielinski,A. Swami , "Mining association rules between sets of items in large databases", In Proceedings of the ACM SIGMOD international conference on management of data, Washington DC, pp 207-216,1993.
[11] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics 23 (4) 493-507, 1953.
[12] M. Charikar, K. Chen, M. Farach," Finding frequent items in data streams", in Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP), pp. 693-703,2002.
[13] T. Calders , N. Dexters , B. Goethals ,"Mining Frequent Items in a Stream Using Flexible Windows"
[14] X. Sun Maria, E. Orlowska , X. Li, "Finding Frequent Itemsets in High- Speed Data Streams".
[15] X. Han Dong, W. Ng, K. Wong,V. Lee, "Discovering Frequent Sets from Data Streams with CPU Constraint", This paper appeared at the AusDM 2007, Gold Coast, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol.70
[16] H. Chernoff, "A measure of asymptotic efficiency for thests of a hypothesis based on the sum of observations", in Annals of Mathematical Statistics, pp. 493, 1952.
[17] R. Agrawal, R. Srikant, " Fast algorithms for mining association rules", In Proc. Int. Conf. Very Large Data Bases (VLDB'94), 487.499, 1994
[18] J. Han, J. Pei, Y. Yin," Mining frequent patterns without candidate generation", In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), 1.12, 2000.
[19] J. Pei, J. Han, R. Mao," CLOSET: An efficient algorithm for mining frequent closed itemsets", In Proc. ACM-SIGMOD Int.Workshop Data Mining and Knowledge Discovery (DMKD'00), 11.20,2000.
[20] M. Zaki, C. Hsiao," CHARM: An efficient algorithm for closed itemset mining", In Proc. SIAM Int. Conf. Data Mining, 457.473, 2002.
[21] R. Agrawal, R. Srikant," Mining sequential patterns", In Proc. Int. Conf. Data Engineering (ICDE'95), 3.14, 1995.
[22] M. Kuramochi, G. Karypis, "Frequent subgraph discovery", In Proc.Int. Conf. Data Mining (ICDM'01), 313.320, 2001.
[23] K. Beyer, R. Ramakrishnan," Bottom-up computation of sparse and iceberg cubes", In Proc. ACM-SIGMOD Int. Conf.Management of Data (SIGMOD'99), 359.370,1999.
[24] T. Imielinski, L. Khachiyan,A. Abdulghani,"Cubegrades: Generalizing association rules", Data Mining and knowledge Discovery 6:219.258,2002.
[25] B. Liu, W. Hsu, Y. Ma," Integrating classification and association rule mining", In Proc. Int. Conf. Knowledge Discovery and Data Mining (KDD'98), 80.86,1998.
[26] H. Wang, J. Yang, W. Wang, P. Yu," Clustering by pattern similarity in large data sets", In Proc. ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'02), 418.427,2002.
[27] R. Karp, C. Papadimitriou, S. Shenker, "A simple algorithm for finding frequent elements in streams and bags", ACM Trans. Database Systems 2003.
[28] Y. Chi, H. Wang, P. Yu, R. Muntz," Moment: Maintaining closed frequent itemsets over a stream sliding window", In Proceedings of International Conference on Data Missing Conference, pp. 59-66, 2004.
[29] C.Jin, W. Qian, C. Sha, J.Yu, A. Zhou," Dynamically maintaining frequent items over a data stream", In Proceedings of International Conference on Information and Knowledge Management Conference, pp. 287-29, Washington, District of Columbia, 2004.
[30] H. Li, S. Lee, M. Shan, "An efficient algorithm for mining frequent itemsets over the entire history of data streams", In Proceedings of the 1st International Workshop on Knowledge Discovery in Data streams, 2003.
[31] G. Manku,R. Motwani, " Approximate frequency counts over data streams. In Proceedings of very Large Databases Conference, pp. 346- 357, Hong Kong, China, 2002.
[32] Y. Chen, G. Dong, J. Han, B. Wah, J. Wang, ," Multidimensional regression analysis of time-series data streams" In VLDB Conference,2002.
[33] W. Teng, M. Chen, P. Yu, "A regression-based temporal patterns mining schema for data streams", In Proceedings of very large Databases Conference, pp. 93-104, Berlin, 2003.
[34] X. Hong , W. Keong , K. Leong Ong ,v. C S Lee , "Discovering Frequent Sets from Data Streams with CPU Constraint", Sixth Australasian Data Mining Conference, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 70, 2007.