Density Clustering Based On Radius of Data (DCBRD)
Authors: A.M. Fahim, A. M. Salem, F. A. Torkey, M. A. Ramadan
Abstract:
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.
Keywords: Clustering Algorithms, Arbitrary Shape of clusters, cluster Analysis.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062378
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1874References:
[1] Beckmann N., Kriegel H.-P., Schneider R, and Seeger B. "The R*- tree: An Efficient and Robust Access Method for Points and Rectangles", Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322-331.
[2] DEFAYS, D. An efficient algorithm for a complete link method. The Computer Journal, vol. 20, 1977, pp. 364-366.
[3] Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
[4] Ester M., Kriegel H-P., Sander J., Xu X. "Clustering for Mining in Large Spatial Databases", Special Issue on Data Mining, KIJournal, ScienTec Publishing, Vol. 1, 1998, pp. 1-7.
[5] Fayyad U., Piatetsky G., Smyth P., Uthurusay R., "Advances in knowledge discovery", AAAI press, Cambridge,1996.
[6] Gordon A. D. "A review of hierarchical classification", Journal of the Royal statistical society. Series A, Vol.150, 1987, pp. 119-137.
[7] Guha S., Rastogi R., Shim K.: "CURE: An Efficient Clustering Algorithms for Large Databases", Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 73-84.
[8] HAN J., KAMBER M., and TUNG A. K. H. "Spatial clustering methods in data mining: A survey". Taylor and Francis, 2001.
[9] Krznaric D. and Levcopoulos C. "Optimal algorithms for complete linkage clustering in d dimensions". Theor. Comput. Sci., 286(1), 2002, pp. 139-149.
[10] Kriegel. H-P., Peer K., and Irina G., "Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering.", Proc. 5th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK'03), Prague, Czech Rep., 2003, pp. 224-233.
[11] Kaufman L., Rousseeuw P. J.: "Finding Groups in Data: An Introduction to Cluster Analysis", John Wiley & Sons, 1990.
[12] MacQueen, J.: "Some Methods for Classification and Analysis of Multivariate Observations", 5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp. 281-297.
[13] Ng R. T., Han J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. On Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155.
[14] Sibson R.: "SLINK: an optimally efficient algorithm for the singlelink cluster method". The Computer Journal, Vol. 16, No. 1, 1973, pp. 30-34.
[15] Shi-hong Y.,Ping L., Ji-dog G.and Shui-geng Z."A Statistical Information-based clustering Approach in distance space", JZUS, vol. 6A(1), 2005, pp. 71-78.
[16] Voorhees, E.M. "Implementing agglomerative hierarchical clustering algorithms for use in document retrieval". Information Processing and Management, 22, 6, 1986, pp. 465-476.
[17] Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Databases". Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp.103-114.