Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30123
A Comparative Study of GTC and PSP Algorithms for Mining Sequential Patterns Embedded in Database with Time Constraints

Authors: Safa Adi

Abstract:

This paper will consider the problem of sequential mining patterns embedded in a database by handling the time constraints as defined in the GSP algorithm (level wise algorithms). We will compare two previous approaches GTC and PSP, that resumes the general principles of GSP. Furthermore this paper will discuss PG-hybrid algorithm, that using PSP and GTC. The results show that PSP and GTC are more efficient than GSP. On the other hand, the GTC algorithm performs better than PSP. The PG-hybrid algorithm use PSP algorithm for the two first passes on the database, and GTC approach for the following scans. Experiments show that the hybrid approach is very efficient for short, frequent sequences.

Keywords: Database, GTC algorithm, PSP algorithm, sequential patterns, time constraints.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1340597

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 256

References:


[1] F. Masseglia, P. Poncelet, M. Teisseire. ”Efficient mining of sequential patterns with time constraints: Reducing the combinations”. Expert Systems with Applications, Volume 36, pages 2677-2690, 2009.
[2] F. Cathala, F. Masseglia, P. Poncelet. ”The PSP Approach for Mining Sequential Patterns”. ACM, pagesv176-184, 1998.
[3] Th. Rincy.N, Y. Pandey. ”Performance evaluation on state of the art sequential pattern mining algorithms”. International Journal of Computer Applications (0975-8887), Volume 65, N0. 14, March 2013.
[4] M. Lin, S. Hseueh, C. Chang. ”Mining closed sequential patterns with time constraints”. International Journal Of Information Science And Engineering Volume 24, pages 33-46, 2008.
[5] M. Morzy, T. Zakrzewicz, ”Efficient constraint-based sequential pattern mining using dataset filtering techniques”. In Proceedings of the Baltic conference, BalticDB IS, pages 213-224, 2002.
[6] M. J. Zaki, ”SPADE: An efficient algorithm for mining frequent sequences”. Machine Learning Journal, Volume 42, NO. 1, pages 31-60, 2001.
[7] R. Srikant and R. Agrawal. ”Mining Sequential Patterns: Generalizations and Performance Improvements”. In Proc. of the EDBT’96, Avignon, France, Sept 1996.
[8] U. M. Fayad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. ”Advances in Knowledge Discovery and Data Mining”. AAAI Press, 1996.
[9] J. Wijsen. ”Condensed Representation of Database Repairs for Consistent Query Answering”. In International Conference on Database Theory (ICDT), Springer-Verlag, LNCS 2572, pages 378-393, 2003.
[10] P. C. Kanellakis. ”Elements of Relational Database Theory”. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 17, Elsevier/MIT Press, pages 1073-1158, 1990.
[11] J. Fry, G. Xian, S. Jin, J. Dewitz, C. Homer, L. Yang, C. Barnes, N. Herold, J. Wickham. ”Completion of the 2006 National Land Cover Database for the conterminous United States”, Photogrammetric Engineering Remote Sensing, Volume 77, NO. 9, pages 858-863, 2011.
[12] B. C. Ooi, C. Yu, K. L. Tan, and H. V. Jagadish. ”Indexing the distance: an efficient method to knn processing”. In Procdf the Int. Conf. on Very Large Data Bases, 2001.
[13] K. V. Ravi Kanth, D. Agrawal, Amr El Abbadi, and Ambuj K. Singh. ”Dimensionality reduction for similarity searching in dynamic databases”. In Proc. A CM SIGMOD Int. Conf. on Management of Data, 1998.
[14] J. Griss, Y. Perez-Riverol, H. Hermjakob, J. A. Vizcaino. ”Identifying novel biomarkers through data mining-A realistic scenario”. Proteomics Clin. Appl., Volume 9, pages 437-443, 2015.
[15] N. Roussopoulos, S. Kelley, and F. Vincent. ”Nearest neighbor queries”. In Proc. A CM SIGMOD Int. Conf. on Management of Data, pages 71-79, May 1995.
[16] H. Samet. ”Recent developments in linear quadtree-based geographic information systems”. Image and Vision Computing, Volume 5, NO. 3, pages 187-197, Aug. 1987.
[17] T. Sellis, N. Roussopoulos, and C. Faloutsos. ”The r+-tree: A dynamic index for multi-dimensional objects”. Procdf the Int. Conf. on Very Large Data Bases, Volume 13, pages 507-518, 1988.
[18] Y. Theodoridis and T. K. Sellis. ”Optimization issues in r-tree construction”. In Geographic Information Systems (IGIS), pages 270-273, 1994.
[19] F. Wang. ”Relational-linear quadtree approach for two-dimensional spatial representation and manipulation”. IEEE Trans. on Knowledge and Data Engineering, Volume 3, NO. 1, pages 118-122, Mar. 1991.
[20] J. Chomicki, J. Marcinkowski, and S. Staworko, ”Computing consistent query answers using conflict hypergraphs,” in CIKM, D. Grossman, L. Gravano, C. Zhai, O. Herzog, and D. A. Evans, Eds. ACM, pages 417-426, 2004.