Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

Authors: Mahdi Esmaeili, Mansour Tarafdar

Abstract:

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Keywords: Sequential Patterns, Data Mining, ParallelAlgorithm, Multidimensional Sequence Data

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1057041

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1024

References:


[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques , 1st ed., Morgan Kaufmann, New York, August 2001.
[2] R. Agrawal and R. Srikant, "Mining sequential patterns," in Eleventh International Conference on Data Engineering, P. S. Yu and A. S. P. Chen, Eds. Taipei, Taiwan: IEEE Computer Society Press, 1995, pp. 3- 14.
[3] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal, "Multidimensional sequential pattern mining," in Proceedings of the tenth international conference on Information and knowledge management (CIKM '01). New York, NY, USA: ACM, 2001, pp. 81-88.
[4] M.J. Zaki, H. Ching-Tien, " Large scale parallel data mining", Lecture notes in artificial intelligence, Vol 1759, Springer-Verlag 2000.
[5] R. Srikant and R. Agrawal, "Mining sequential patterns: Generalizations and performance improvements," in Proc. 5th Int. Conf. Extending Database Technology, EDBT, P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, Eds., vol. 1057. Springer-Verlag, FebruaryMay- FebruarySeptember~ 1996, pp. 3-17.
[6] M. J. Zaki, "Spade: An efficient algorithm for mining frequent sequences," Machine Learning, vol. 42, no. 1/2, pp. 31-60, 2001.
[7] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "Prefixspan,: mining sequential patterns efficiently by prefixprojected pattern growth," in Proc.17th Int'l Conf. on Data Eng, 2001, pp. 215-224.
[8] W. Jinlin, X. Chen, Z. Kefa,W. Wei, "Parallel Research of Sequential Pattern Data Mining Algorithm", Int-l Conference on Computer Science and Software Engineering, vol 4, 2008, pp. 348-353.
[9] T. Shintani, M. Kitsuregawa, "Mining algorithms for sequential patterns in parallel: Hash based approach", In Proc of the Second Pacific-Asia Conf on Knowledge Discovery and Data mining, 1998, pp. 283-294.
[10] V. Guralnik, N. Garg, G. Karypis, "Parallel tree projection algorithm for sequence mining", In Proc of 7th European Conf on Parallel Computing, 2001, pp. 310-320.
[11] S. de Amo, D.A. Furtado, A. Giacometti, D. Laurent, "An apriori-based approach for first-order temporal pattern mining", in: Proceedings of the 19th Brazilian Symposium on Databases, Brasilia, Brazil, October 2004, pp. 48-61.
[12] M. Plantevit, Y.W. Choong, A. Laurent, D. Laurent, M. Teisseire , "M2SP: Mining Sequential Patterns Among Several Dimensions", Principles of Knowledge Discovery in Databases, Volume 3721, page 205-216, 2005.
[13] C.-C. Yu and Y.-L. Chen, "Mining sequential patterns from multidimensional sequence data," Knowledge and Data Engineering, IEEE Transactions on, vol. 17, no. 1, pp. 136-140, 2005.