Hierarchical Clustering Algorithms in Data Mining

Z. Abdullah; A. R. Hamdan

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32794

Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.

Keywords: Clustering, method, algorithm, hierarchical, survey.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1109341

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3327

References:

[1] M. Brown, “Data mining techniques” Retrieved from http://www.ibm.com/developerworks/library/ba-data-mining-techniques/
[2] S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical attributes” Proceeding of 15th International Conference on Data Engineering – ACM SIGKDD, pp. 512-521, 1999.
[3] M. Dutta, A.K. Mahanta, and A.K. Pujari, “QROCK: A quick version of the ROCK algorithm for clustering of categorical data,” Pattern Recognition Letters, 26 (15), pp. 2364-2373, 2005.
[4] L. Feng, M-H. Qiu, Y-X. Wang, Q-L. Xiang and K. Liu, "A fast divisive clustering algorithm using an improved discrete particle swarm optimizer, Pattern Recognition Letters, 31, pp. 1216-1225, 2010
[5] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” NewsLetter – ACMSIGMOD, 25 (2), pp. 103-114, 1996.
[6] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” News Letter – ACM-SIGMOD, 7(2), pp. 73-84, 1998.
[7] G. Karypis, E-H Han, and V. Kumar, “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEE Computer, 32 (8), 68-75, 1999.
[8] R.O. Duda and P.E. Hart, (1973). Pattern Classification and Scene Analysis. A Wiley-Interscience Publication, New York.
[9] R.T. Ng and J. Han, "Efficient and effective clustering methods for spartial data mining," Proceeding of the VLDB Conference, pp. 144-155, 1994.
[10] Y. Zhao and G. Karypis, “Evaluation of hierarchical clustering algorithms for document datasets,” Proceedings of the 11th International Conference on Information and Knowledge Management – ACM, pp. 515-524, 2002.
[11] S. Salvador and P. Chan. “Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms,” Tools with Artificial Intelligence - IEEE, pp. 576-584, 2004.
[12] H. Koga, T. Ishibashi, and T. Watanabe. “Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing,” Knowledge and Information Systems, 12 (1), pp. 25-53, 2007.
[13] V.S. Murthy, E, Vamsidhar, J.S. Kumar, and P.S Rao, “Content based image retrieval using Hierarchical and K-means clustering techniques,” International Journal of Engineering Science and Technology, 2 (3), pp. 209-212, 2010.
[14] S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai, and C.D. Perkasa, “A novel intrusion detection system based on hierarchical clustering and support vector machines,” Expert Systems with Applications, 38 (1), pp. 306-313, 2011.
[15] M.F. Balcan, Y. Liang, and P. Gupta, “Robust hierarchical clustering,” Journal of Machine Learning Research, 15, pp. 3831-3871, 2014.
[16] S.M. Szilágyi, and L. Szilágyi, “A fast hierarchical clustering algorithm for large-scale protein sequence data sets,” Computers in Biology and Medicine, 48, pp. 94-101, 2014.
[17] R.T. Ng, and J. Han, “CLARANS: A Method for Clustering Objects for Spatial Data Mining,” IEEE Transactions on Knowledge and Data Engineering, 14 (5), pp. 1003-1016, 2005.
[18] Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data Mining and Knowledge Discovery, 2 (3), pp. 283-304, 1998.
[19] H. Huang, Y. Gao, K. Chiew, L. Chen, and Q. He, “Towards effective and efficient mining of arbitrary shaped clusters,” Proceeding of 30th International Conference on Data Engineering – IEEE, pp. 28-39, 2008
[20] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proceedings of the 1996 ACM SIGMOD international conference on Management of data - SIGMOD '96. pp. 103-114, 1996.
[21] H. Huang, Y. Gao, K. Chiew, K, L. Chen and Q. He, “Towards Effective and Efficient Mining of Arbitrary Shaped Clusters,” IEEE 30th ICDE Conference, pp. 28-39, 2014.
[22] P. Berkhin, “A survey of clustering data mining techniques,” Grouping Multidimensional Data – Springer, pp. 25-71, 2006.
[23] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems, 17 (2-3), pp. 107-145, 2001.
[24] J. Meng, S-J. Gao, and Y. Huang, “Enrichment constrained timedependent clustering analysis for finding meaningful temporal transcription modules,” Bioinformatics, 25 (12), pp. 1521–1527, 2009.
[25] A.T. Ernst and M. Krishnamoorthy, “Solution algorithms for the capacitated single allocation hub location problem,” Annals of Operations Research, 86, pp. 141-159, 1999.
[26] M. Laan, and K. Pollard, "A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap," Journal of Statistical Planning and Inference, 117 (2), p.275-303, Dec 2002.
[27] Y. Zhao, G. Karypis, and U. Fayyad, “Hierarchical Clustering Algorithms for Document Datasets,” Journal Data Mining and Knowledge Discovery archive, 10 (2), pp. 141-168, March 2005
[28] S.A. Mingoti, and J.O. Lima, “Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms,” European Journal of Operational Research - Science Direct. 174 (3), pp. 1742–17591, November 2006.
[29] A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke, “Personalized recommendation in social tagging systems using hierarchical clustering,” Proceedings of the 2008 ACM conference on Recommender systems, pp. 259-266 (2008).
[30] H. Koga, T. Ishibashi, and T, Watanabe, “Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing,” Knowledge and Information Systems,12 (1), pp. 25-53, May 2007
[31] O.A. Abbas, Comparisons between Data Clustering Algorithms, The International Arab Journal of Information Technology, 5 (3), pp.320 – 325, 2008.
[32] G. Xin, W.H. Yang, and B. DeGang, “EEHCA: An energy-efficient hierarchical clustering algorithm for wireless sensor networks,” Information Technology Journal, 7 (2), pp. 245-252, 2008.
[33] A.K. Jain,, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters - Science Direct, 31 (8), pp. 651–666, June 2010
[34] V.S. Murthy, E. Vamsidhar, J.S. Kumar, and P.S. Rao, “Content based image retrieval using Hierarchical and K-means clustering techniques,” International Journal of Engineering Science and Technology, 2 (3), pp. 209-212, 2010.
[35] Y. Cai, and Y. Sun, “ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time”. Nucleic Acids Res, 2011.
[36] S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai, and C.D. Perkasa, “ A novel intrusion detection system based on hierarchical clustering and support vector machines,” Exp. Sys. W. Appl., 38, pp. 306-313, 2011.
[37] G. Kou, and C. Lou, “Multiple factor hierarchical clustering algorithm for large scale web page and search engine click stream data,” Annals of Operations Research, 197 (1), pp. 123-134, August 2012 .
[38] A. Krishnamurthy, S. Balakrishnan, M. Xu, and A. Singh, “Efficient active algorithms for hierarchical clustering,” Proceedings of the 29th International Conference on Machine Learning, pp. 887-894, 2012.
[39] P. Langfelder, and S. Horvath, “Fast R functions for robust correlations and hierarchical clustering,” J Stat Softw., 46 (11), pp. 1-17, March 2012.
[40] Y., Malitsky, A. Sabharwal, H. Samulowitz, and M. Sellmann, “Algorithm portfolios based on cost-sensitive hierarchical clustering,” Proceedings of the 23rd international joint conference on Artificial Intelligence, pp. 608-614, 2013.
[41] M. Meila, and D. Heckerman, “An experimental comparison of several clustering and initialization methods,” Proceedings of the 14th conference on Uncertainty in artificial intelligence, pp. 386-395, 1998
[42] D. Müllner, “Fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python,” Journal of Statistical Software, 53 (9), pp. 1- 18, 2013.
[43] M.F. Balcan, Y. Liang, and P. Gupta, “Robust hierarchical clustering” arXiv preprint arXiv:1401.0247, 2014.
[44] F. Murtagh, and P. Legendre, “ Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?”, Journal of Classification Archive, 31 (3), pp. 274.295, October 2014.
[45] S.M. Szilágyi, and L. Szilágyi, “A fast hierarchical clustering algorithm for large-scale protein sequence data sets,” Comput. Biol. Med., 48, pp. 94–101 (2014).
[46] E. Rashedi, A. Mirzaei, and M. Rahmati, “An information theoretic approach to hierarchical clustering combination,” Neurocomputing, 148, pp. 487-497, 2015.
[47] K. Ding, C. Huo, Y. Xu, Z. Zhong, and C. Pan, “ Sparse hierarchal clustering for VHR image change detection,” Geoscience and Remote Sensing Letters, IEEE, 12 (3), pp. 577 – 581, 2015.