Commenced in January 2007
Paper Count: 30172
A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data
Abstract:Clustering algorithms help to understand the hidden information present in datasets. A dataset may contain intrinsic and nested clusters, the detection of which is of utmost importance. This paper presents a Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets. For handling massive datasets, we implemented our method using a 'sharednothing' architecture where multiple computers are interconnected over a network. Experimental results are reported to establish the superiority of the technique in terms of scale-up, speedup as well as cluster quality.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1061294Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269
 J. Han and M. Kamber, Data Mining: Concepts and Techniques. India: Morgan Kaufmann Publishers, 2004.
 M. Ester, H. P. Kriegel, J. Sander and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", in International Conference on Knowledge Discovery in Databases and Data Mining (KDD-96), Portland, Oregon, 1996, pp. 226-231.
 C. Hsu and M. Chen, "Subspace Clustering of High Dimensional Spatial Data with Noises", PAKDD, 2004, pp. 31-40.
 W. Wang, J. Yang, and R. R. Muntz, "STING: A Statistical Information Grid Approach to Spatial data Mining", in Proc. 23rd International Conference on Very Large Databases, (VLDB), Athens, Greece, Morgan Kaufmann Publishers, 1997, pp. 186 - 195.
 G. Sheikholeslami, S. Chatterjee and A. Zhang, "Wavecluster: A Multiresolution Clustering approach for very large spatial database", in SIGMOD'98, Seattle, 1998.
 R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, "Automatic subspace clustering of high dimensional data for data mining applications", in SIGMOD Record ACM Special Interest Group on Management of Data, 1998, pp. 94-105.
 H. S. Nagesh, S. Goil and A. N. Choudhary, "A scalable parallel subspace clustering algorithm for massive data sets", in Proc. International Conference on Parallel Processing, 2000, pp. 477.
 L. Ertoz, M. Steinbach and V. Kumar, "Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data", in SIAM International Conference on Data Mining (SDM '03), 2003.
 G. Karypis, Han and V. Kumar, "CHAMELEON: A hierarchical clustering algorithm using dynamic modeling", IEEE Computer, 32(8), pp 68-75, 1999.
 Y. Zhao, S. Mei, X. Fan, S. Jun-de. 2003. Clustering Datasets Containing Clusters of Various Densities. Journal of Beijing University of Posts and Telecommunications, 26(2):42-47.
 H. S. Kim, S. Gao, Y. Xia, G. B. Kim and H. Y. Bae, "DGCL: An Efficient Density and Grid Based Clustering Algorithm for Large Spatial Database", Advances in Web-Age Information Management (WAIM'06), pp. 362-371, 2006.
 M. Ankerst, M. M. Breuing, H. P. Kriegel and J. Sander, "OPTICS: Ordering Points To Identify the Clustering Structure", in ACMSIGMOD, pp. 49-60, 1999.
 S. Roy and D. K. Bhattacharyya, "An Approach to Find Embedded Clusters Using Density Based Techniques", in Proc. ICDCIT, LNCS 3816, pp. 523-535, 2005.
 B. Borah, D. K. Bhattacharyya and R. K. Das, "A Parallel Density-Based Data Clustering Technique on Distributed Memory Multicomputers", in Proc. ADCOM, Ahmedabad, 2004.
 I. S. Dhilon and D. S. Modha, "A Data-Clustering Algorithm on Distributed Memory Multiprocessors", in International Conference on Knowledge Discovery and Data Mining (SIGKDD 99), 1999.
 X. Xu, J. Jager and H. P. Kriegel, "A Fast Parallel Clustering Algorithm for Large Spatial Databases", Data Mining and Knowledge Discovery, 3, Kluwer Academic Publisher, pp. 263-290, 1999.
 E. Januzaj, H. P. Kriegel and M. Pfeifle, "Towards Effective and Efficient Distributed Clustering.Workshop on Clustering Large Data Sets", ICDM'03.Melbourne, Florida, 2003.
 D. Foti, D. Lipari, C. Pizzuti and D. Talia, "Scalable Parallel Clustering for Data Mining on Multicomputers",15 IPDPS workshops, pp. 390-398, 2000.
 E.K. Johnson and H. Kargupta, "Collective Hierarchical Clustering from Distributed, Heterogeneous Data", Large Scale Parallel data Mining, LNCS 1759, Springer, 2000.
 S. Sarmah, R. Das and D. K. Bhattacharyya, "Intrinsic Cluster Detection Using Adaptive Grids", in Proc. ADCOM'07, Guwahati, 2007.
 Available: http//steve.hollasch.net /cgindex/math /barycentric.html