Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Authors: Sunita Jahirabadkar, Parag Kulkarni

Abstract:

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.

Keywords: Density based clustering, high dimensional data, subspace clustering, dynamic parameter setting.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080484

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1650

References:


[1] Michael Steinbach, Levent Ertöz and Vipin Kumar, "The Challenges of Clustering High Dimensional Data", (online). Available : http://wwwusers. cs.umn.edu/~kumar/papers/high_dim_clustering_19.pdf
[2] R. Sibson. SLINK, "An optimally efficient algorithm for the single-link cluster method", The Computer Journal, 16(1):30{34,1973.
[3] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with Noise", In Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, 1996.
[4] J. Han and M. Kamber, "Data Mining: Concepts and Techniques", Morgan Kaufman, 2001.
[5] R. Agrawal, J. Gehrke, D. Gunopulos, and. Raghavan, "Automatic subspace clustering of high dimensional data for data mining applications", In Proceedings of the SIGMOD Conference, Seattle, WA, 1998.
[6] C. H. Cheng, A. W.-C. Fu, and Y. Zhang, "Entropy-based subspace clustering for mining numerical data", In Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Diego, CA, pages 84{93, 1999.
[7] S. Goil, H. Nagesh, and A. Choudhary, "MAFIA: Efficient and scalable subspace clustering for very large data sets", Technical Report CPDCTR- 9906-010, Northwestern University, 1999.
[8] K. Kailing, H.P. Kriegel, and P. Kroger, "Density-connected subspace clustering for high-dimensional data", In Proceedings of the 4th SIAM International Conference on Data Mining (SDM), Orlando, FL, 2004.
[9] H.P. Kriegel, P. Kroger, M. Renz, and S. Wurst, "A generic framework for efficient subspace clustering of high-dimensional data. In Proceedings of the 5th International Conference on Data Mining (ICDM), Houston, TX, 2005.
[10] C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali, "A Monte Carlo algorithm for fast projective clustering. In Proceedings of the SIGMOD Conference, Madison, WI, 2002.
[11] C. Bohm, K. Kailing, H.P. Kriegel, and P. Kroger, "Density connected clustering with local subspace preferences", In Proceedings of the 4th International Conference on Data Mining (ICDM), Brighton, U.K., 2004.
[12] C. Baumgartner, Plant C, Railing K, Kriegel H. -P, Kroger P, "Subspace Selection for Clustering High-Dimensional Data", In proceedings of 4th IEEE Int. Conference on Data Mining (ICDM 04), PP 11-18, Brighton, UK, 2004.
[13] Daxin Jiang, Chun Tang , Aidong Zhang: "Cluster Analysis for Gene Expression Data: A Survey", IEEE Transactions on Knowledge and Data Engineering, Issue Date : November 2004, pp. 1370-1386.
[14] Elke Achtert, Christian Bohm, Hans-Peter Kriegel, Peer Kroger, Ina Muller-Gorman, Arthur Zimek, "Finding Hierarchies of Subspace Clusters", In Proceedings of 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Berlin, Germany, 2006.