Approximate Frequent Pattern Discovery Over Data Stream

Kittisak Kerdprasop; Nittaya Kerdprasop

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Approximate Frequent Pattern Discovery Over Data Stream

Authors: Kittisak Kerdprasop, Nittaya Kerdprasop

Abstract:

Frequent pattern discovery over data stream is a hard problem because a continuously generated nature of stream does not allow a revisit on each data element. Furthermore, pattern discovery process must be fast to produce timely results. Based on these requirements, we propose an approximate approach to tackle the problem of discovering frequent patterns over continuous stream. Our approximation algorithm is intended to be applied to process a stream prior to the pattern discovery process. The results of approximate frequent pattern discovery have been reported in the paper.

Keywords: Frequent pattern discovery, Approximate algorithm, Data stream analysis.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1072497

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1345

References:

[1] C. Aggarwal, J. Han, J. Wang, and P. Yu, "A framework for clustering evolving data streams," in Pro. Very Large Data Bases, 2003.
[2] R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases," in Proc. ACM SIGMOD Int. Conf. Management of Data, 1993, pp. 207-216.
[3] R. Agrawal and R. Srikant, "Fast algorithm for mining association rules," in Proc. Int. Conf. Very Large Data Bases, 1994, pp. 487-499.
[4] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Model and issues in data stream systems," in Pro. ACM PODS, 2002.
[5] J. Bilmes, "A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models," Dept. Electrical Engineering and Computer Science, University of California Berkeley, Technical Report TR-97-021, 1998.
[6] G. Coremode and S. Muthukrishnan, "What-s hot and what-s not: Tracking most frequent items dynamically," in Pro. ACM PODS, 2003.
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society B, vol. 39, pp. 1-22, 1977.
[8] P. Domingos and G. Hulten, "A general method to scaling up machine learning algorithms and its application to clustering," in Pro. ICML, 2001.
[9] M. A. T. Figueiredo and A. K. Jain, "Unsupervised learning of finite mixture models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 381-396, 2002.
[10] M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining data stream: A review," SIGMOD Record, vol. 34, pp. 18-26, 2005.
[11] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "One-pass wavelet decompositions of data streams," IEEE Trans. Knowledge and Data Engineering, vol. 15, pp. 541-554, 2003.
[12] J. M. Marin, K. Mengersen, and C. Robert, "Bayesian modelling and inference on mixtures of distributions," in Handbook of Statistics, vol. 25, Elsevier-Science, 2005.
[13] S. Muthukrishnan, "Data streams: Algorithms and applications," in Proc. ACM-SIAM Symposium on Discrete Algorithm, 2003.
[14] B. Resch, "A tutorial for the course computational intelligence," Available: http://www.igi.tugraz.at/lehre/CI
[15] J. von Neumann, "Various techniques used in connection with random digits," Applied Mathematics Series, vol. 12, National Bureau of Standards, Washington, D.C., 1951.