Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31113
A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

Authors: Sauravjyoti Sarmah, Rosy Das, Dhruba Kr. Bhattacharyya


Clustering algorithms help to understand the hidden information present in datasets. A dataset may contain intrinsic and nested clusters, the detection of which is of utmost importance. This paper presents a Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets. For handling massive datasets, we implemented our method using a 'sharednothing' architecture where multiple computers are interconnected over a network. Experimental results are reported to establish the superiority of the technique in terms of scale-up, speedup as well as cluster quality.

Keywords: Clustering, Density-based, Grid-based, Adaptive Grid

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1322


[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques. India: Morgan Kaufmann Publishers, 2004.
[2] M. Ester, H. P. Kriegel, J. Sander and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", in International Conference on Knowledge Discovery in Databases and Data Mining (KDD-96), Portland, Oregon, 1996, pp. 226-231.
[3] C. Hsu and M. Chen, "Subspace Clustering of High Dimensional Spatial Data with Noises", PAKDD, 2004, pp. 31-40.
[4] W. Wang, J. Yang, and R. R. Muntz, "STING: A Statistical Information Grid Approach to Spatial data Mining", in Proc. 23rd International Conference on Very Large Databases, (VLDB), Athens, Greece, Morgan Kaufmann Publishers, 1997, pp. 186 - 195.
[5] G. Sheikholeslami, S. Chatterjee and A. Zhang, "Wavecluster: A Multiresolution Clustering approach for very large spatial database", in SIGMOD'98, Seattle, 1998.
[6] R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, "Automatic subspace clustering of high dimensional data for data mining applications", in SIGMOD Record ACM Special Interest Group on Management of Data, 1998, pp. 94-105.
[7] H. S. Nagesh, S. Goil and A. N. Choudhary, "A scalable parallel subspace clustering algorithm for massive data sets", in Proc. International Conference on Parallel Processing, 2000, pp. 477.
[8] L. Ertoz, M. Steinbach and V. Kumar, "Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data", in SIAM International Conference on Data Mining (SDM '03), 2003.
[9] G. Karypis, Han and V. Kumar, "CHAMELEON: A hierarchical clustering algorithm using dynamic modeling", IEEE Computer, 32(8), pp 68-75, 1999.
[10] Y. Zhao, S. Mei, X. Fan, S. Jun-de. 2003. Clustering Datasets Containing Clusters of Various Densities. Journal of Beijing University of Posts and Telecommunications, 26(2):42-47.
[11] H. S. Kim, S. Gao, Y. Xia, G. B. Kim and H. Y. Bae, "DGCL: An Efficient Density and Grid Based Clustering Algorithm for Large Spatial Database", Advances in Web-Age Information Management (WAIM'06), pp. 362-371, 2006.
[12] M. Ankerst, M. M. Breuing, H. P. Kriegel and J. Sander, "OPTICS: Ordering Points To Identify the Clustering Structure", in ACMSIGMOD, pp. 49-60, 1999.
[13] S. Roy and D. K. Bhattacharyya, "An Approach to Find Embedded Clusters Using Density Based Techniques", in Proc. ICDCIT, LNCS 3816, pp. 523-535, 2005.
[14] B. Borah, D. K. Bhattacharyya and R. K. Das, "A Parallel Density-Based Data Clustering Technique on Distributed Memory Multicomputers", in Proc. ADCOM, Ahmedabad, 2004.
[15] I. S. Dhilon and D. S. Modha, "A Data-Clustering Algorithm on Distributed Memory Multiprocessors", in International Conference on Knowledge Discovery and Data Mining (SIGKDD 99), 1999.
[16] X. Xu, J. Jager and H. P. Kriegel, "A Fast Parallel Clustering Algorithm for Large Spatial Databases", Data Mining and Knowledge Discovery, 3, Kluwer Academic Publisher, pp. 263-290, 1999.
[17] E. Januzaj, H. P. Kriegel and M. Pfeifle, "Towards Effective and Efficient Distributed Clustering.Workshop on Clustering Large Data Sets", ICDM'03.Melbourne, Florida, 2003.
[18] D. Foti, D. Lipari, C. Pizzuti and D. Talia, "Scalable Parallel Clustering for Data Mining on Multicomputers",15 IPDPS workshops, pp. 390-398, 2000.
[19] E.K. Johnson and H. Kargupta, "Collective Hierarchical Clustering from Distributed, Heterogeneous Data", Large Scale Parallel data Mining, LNCS 1759, Springer, 2000.
[20] S. Sarmah, R. Das and D. K. Bhattacharyya, "Intrinsic Cluster Detection Using Adaptive Grids", in Proc. ADCOM'07, Guwahati, 2007.
[21] Available: http// /cgindex/math /barycentric.html