A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32807
A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1336474

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2553

References:


[1] Sandro Vega-pons & Jose reuiz Shulcloper. "A Survey of Clustering Ensemble algorithms”. International Journal of Pattern Recognition and Artificial Intelligence Vol. 25, No. 3 337_372, 2011.
[2] Cristofor.D & Simovici.D, "Finding Median Partitions Using Information Theoretical Based Genetic Algorithms”. J. Universal Computer Science, vol. 8, no. 2, pp. 153-172, 2002.
[3] Fisher.D.H. "Knowledge Acquisition via Incremental Conceptual Clustering”. Machine Learning, vol. 2, pp. 139-172, 1987.
[4] Gibson. D, Klein. J & Raghavan. R, Clustering Categorical Data: "An Approach Based on Dynamical Systems”. Very Large Data Base Endowment Journal .vol. 8, nos. 3-4, pp. 222-236, 2000
[5] Guha. S, Rastogi. R, & Shim. K,. ROCK: "A Robust Clustering Algorithm for Categorical Attributes”. Information Systems, vol. 25, no. 5, pp. 345-366, 2000
[6] Zaki. M. J & Peters. M. "Clicks: Mining Subspace Clusters in Categorical Data via Kpartite Maximal Cliques”. Proc. International Conference on Data Engineering (ICDE), pp. 355-356, 2005.
[7] Ganti. V, Gehrke. J., & Ramakrishnan. R "CACTUS: Clustering Categorical Data Using Summaries”. Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 73-83, 1999.
[8] Barbara. D, Li. Y, & Couto. J "COOLCAT: An Entropy-Based Algorithm for Categorical Clustering”. Proc. International Conference on Information and Knowledge Management pp. 582-589, 2002.
[9] Yang. Y, Guan. S, & You. J. "CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data”. Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 682- 687, 2002.
[10] He. Z, Xu. X, & S. Deng. "Squeezer: An Efficient Algorithm for Clustering Categorical Data”. J. Computer Science and Technology vol. 17, no. 5, pp. 611-624, 2002.
[11] Andritsos. P & Tzerpos. V. "Information Theoretic Software Clustering”. IEEE Transactions on Software Engineering. Vol. 31, no. 2, pp. 150-165, 2005.
[12] Indrajit Saha, Ujjwal Maulik, & Nilanjan. "Differential Fuzzy Clustering for Categorical Data”. International Conference on Methods and Models in Computer Science, 2009.
[13] Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, & Chris Price. "A Link based cluster ensemble approach for categorical data clustering”. IEEE Transactions on knowledge and data engineering, Vol. 24, No. 3, 2012.
[14] Sandro Vega-pons & Jose reuiz Shulcloper. "A Survey of Clustering Ensemble algorithms”. International Journal of Pattern Recognition and Artificial Intelligence Vol. 25, No. 3 337_372 2011.
[15] Harun Pirim, Dilip Gautam, Tanmay , Bhowmik, Andy D. Perkins, Burak Ekşioglu, & Ahmet Alkan,. "Performance of an ensemble clustering algorithm on biological datasets”. Mathematical and Computational Applications, Vol. 16, No. 1, pp. 87-96 2011.
[16] Domeniconi.C & Al-Razgan.M, "Weighted cluster ensembles: methods and analysis”. ACM Transaction on. Knowledge Discovery Data 2(4) 1_40 2009.
[17] Li Zhang*a, Weida Zhoua, Caili Wua, Jieting Huoa, Haishuang Zoua, & Licheng Jiaoa. "Center matching scheme for K-means cluster ensembles”. MIPPR Pattern Recognition and Computer Vision, edited by Mingyue Ding, Bir Bhanu, Friedrich M. Wahl, Jonathan Roberts, Proc. of SPIE Vol. 7496, 749614 SPIE 2009.
[18] Weingessel, A, Dimitriadou, E., & Hornik, K. "An ensemble methodforclustering”.Workingpaperhttp://www.Ci.tuwien.ac.at/conferences/DSC-2003, 51 2003.
[19] Hamid Parvin, Hamid Alinejad-Rokny, & Sajad Parvin. "A New Clustering Ensemble Framework”. International Journal of Learning Management Systems, J. Learn. Man. Sys. 1, No. 1, 19-25 2013.
[20] Yang Lili, Yu Jian, & JIA Caiyan. "A New method for Cluster Ensembles”, Programs Foundation of Ministry of Education of China.2013
[21] Yu J. & Lin Z C. "Squared error adjacency matrix clustering”. Technical report on Dept. of Computer Science, Beijing Jiaotong University 2008.
[22] Fowlkes C, Belongie S, & Chung F, et al. "Spectral grouping using the Nyström method”. IEEE Transactions on Geoscience and Remote Sensing (2): 214-225 2004.
[23] Ng A, Jordan M, & Weiss Y. "On spectral clustering: Analysis and an algorithm(C)”. Advances in Neural Information Processing Systems (NIPS). Boston: MIT Press, 849-857 2002.
[24] XU Yuanchun, JIA Jianhua.. "Adaptive Spectral Clustering Ensemble Selection via Re-sampling and Population Based Incremental Learning Algorithm”. Journal of Natural Sciences, Vol.16 No.3, 228-236 2006.
[25] Al-Razgan.M, Domeniconi.R, & Barbara.D. "Random Subspace Ensembles for Clustering Categorical Data. Supervised and Unsupervised Ensemble Methods and Their Applications”, pp. 31-48, Springer 2008.
[26] Jianhua Jia, Xuan Xiao, & Binxiang Liu,. "Similarity-based Spectral Clustering Ensemble Selection”. 9th IEEE International Conference on Fuzzy Systems and Knowledge Discovery 2012.
[27] Zhang.X.R, JiaoL.C, & Liu.F et.al.. "Spectral clustering ensemble applied to SAR image segmentation”. IEEE Transactions on Geoscience and Remote Sensing, 46 (7)2126-2136 2008.
[28] Hongjun Wang, Hanhuai Shan & Arindam Banerjee. "Bayesian Cluster Ensembles”.Wiley Periodicals, Inc 2011.
[29] Jamil Al-Shaqsi & Wenjia Wang,. "A Clustering Ensemble Method for Clustering Mixed Data”. IEEE International conference 978-1-4244-8126-2/10/$26.00 2010.
[30] Al Shaqsi J. & Wang W. "A Novel Three Staged Clustering Algorithm”. AIDES European Conference on Data Mining, A. P. Abraham, Ed. Ed. Algarve, Portugal, pp. 19-26 2009
[31] Ioannis T. Christou, Member IEEE. "Coordination of Cluster Ensembles via Exact Methods”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 2 2011.
[32] O. du Merle, P. Hansen, B. Jaumard, and N. Mladenovich. "An Interior Point Algorithm for Minimum Sum of Squares Clustering”. SIAM J. Scientific Computing, vol. 21, no. 4, pp. 1484-1505, Mar. 2000.
[33] Topchy A, Jain AK, Punch WF "A mixture model for clustering ensembles. In: Proceedings of SIAM international conference on data mining, SDM 04, pp 379–390 2004a
[34] Fred ALN, Jain AK "Combining multiple clustering using evidence accumulation.” IEEE Trans Pattern Anal Mach Intell 27(6) 2005.
[35] Strehl A, Ghosh J "Cluster ensembles-a knowledge reuse framework for combining multiple partitions”. J Mach Learn Res 3:583–617 2003.
[36] Topchy A, Jain AK, Punch WF "Combining multiple weak clusterings”. In: Proceedings of 3rd IEEE international conference on data mining, pp 331–338 2003.
[37] Gullo F, Domeniconi C, Tagarelli A "Projective clustering ensembles”. In: Proceedings of the international conference on data mining (ICDM), pp 794–799
[38] Ka Ka Ng E, Wai-Chee Fu A, Chi-Wing Wong R "Projective clustering by histograms”. IEEE Trans Knowl Data Eng (TKDE) 17(3):369–3832005.
[39] Yiu ML, Mamoulis N "Iterative projected clustering by subspace mining”. IEEE Trans Knowl Data Eng (TKDE) 17(2):176–189 2005.
[40] Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A "Finding hierarchies of subspace clusters”. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 446–453 2006
[41] Domeniconi C, Gunopulos D,MaS,YanB,Al-Razgan M, PapadopoulosD ”Locally adaptive metrics for clustering high dimensional data”. Data Min Knowl Disc 14(1):63–97 2007.
[42] Deb K "Multi-objective optimization using evolutionary algorithms”. Wiley, New York 2001.
[43] Ruochen Liu, Member, IEEE, Yong Liu, Yangyang Li,Member, IEEE, "An Improved Method for Multi-Objective clustering Ensemble Algorithm”. IEEE World Congress on Computational Intelligence June, 10-15, - Brisbane, Australia 2012.
[44] A. Strehl, J. Ghosh, "Cluster ensembles-a knowledge reuse framework for combining multiple partitions”, Journal of Machine Learning Research 3 583–618 2002.
[45] K. Faceli, A. Carvalho, M. de Souto. "Multi-objective clustering ensemble for gene expression data analysis”, Neuro computing 2753-2774 2009.
[46] Shaohong Zhang, Hau-San Wong,. "ARImp A Generalized Adjusted Rand Index for Cluster Ensembles”. International Conference on Pattern Recognition, IEEE Computer Society 2010.
[47] L. Hubert and P. Arabie. "Comparing partitions”. Journal of Classification, 2:193–218, 1985.
[48] L. I. Kuncheva and D. Vetrov. "Evaluation of stability of k-means cluster ensembles with respect to random initialization”. IEEE Trans. Pattern Anal. Mach. Intell., 28(11):1798–1808, 2006.
[49] Taoying Li, Yan Chen” Fuzzy Clustering Ensemble Algorithm for Partitioning Categorical Data”. International Conference on Business Intelligence and Financial Engineering IEEE Computer Society 2009.
[50] Z. X. Huang. "Extensions to the k-means algorithm for clustering large datasets with categorical values”. Data Mining and Knowledge Discovery, vol.2, no.1, pp. 283-304, 1998.
[51] Sarumathi S, Shanthi N, Sharmila M. " A Comparative Analysis of Different Categorical Data Clustering Ensemble Methods in Data Mining” International Journal of Computer Applications Vol 81, No.4 November 2013.
[52] Zhiwen Yu Member, IEEE, Hantao Chen Jane You Member, IEEE, Guoqiang Han Le Li " Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Bio-molecular Data” IEEE Transactions on computational biology and bioinformatics 2013.
[53] Zhiwen Yu, Member, IEEE, Hau-San Wongb, Member, IEEE, Jane You, Member, IEEE, Qinmin Yang, Member, IEEE, and Hongying Liao “ Knowledge based Cluster Ensemble for Cancer Discovery From Biomolecular Data” IEEE Transactions on Nanobioscience, Vol 10 No. 2, june 2011.