A Review and Comparative Analysis on Cluster Ensemble Methods
Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha
Abstract:
Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.
Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 834References:
[1] S. Sarumathi, N. Shanthi, M. Sharmila, “A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods,” International Journal of Computer, Information Science and Engineering, Vol.7, no.12, 2013.
[2] S. Sarumathi, N. Shanthi, S.Vidhya, M.Sharmila "A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods," World Academy of Science, Engineering and Technology, International journal of computer Information Science and Engineering, Vol.8, No.8, Jan 2014.
[3] Jeremiah R. Barr, Kevin W. Bowyer, and Patrick J. Flynn, "Framework for Active Clustering With Ensembles," IEEE Transactions On Information Forensics And Security, Vol. 9, No. 11, Nov. 2014.
[4] A. P. Topchy, M. H. C. Law, A. K. Jain, and A. L. Fred, “Analysis of consensus partition in cluster ensemble,” in Proc. 4th IEEE Int. Conf. Data Mining, pp. 225–232, Nov. 2004.
[5] A. L. N. Fred and A. K. Jain, “Data clustering using evidence accumulation,” in Proc. 16th Int. Conf. Pattern Recognit, vol. 4, pp. 276–280, 2002.
[6] D. Pelleg and D. Baras, “K-means with large and noisy constraint sets,” in Proc. 18th Eur. Conf. Mach. Learn, pp. 674–682, 2007.
[7] Ebrahim Akbaria, Halina Mohamed Dahlan, Roliana Ibrahim, Hosein Alizadeh, "Hierarchical cluster ensemble selection," ELSEVIER, Engineering Applications of Artificial Intelligence, 2014.
[8] Topchy, A., Jain, A.K., Punch, W," A mixture model of clustering ensembles," In: Proceedings of the International Conference on Data Mining, pp. 379–390, 2004.
[9] Junjie Wu, Hongfu Liu, Hui Xiong, Jie Cao, Jian Chen, "K-means-based Consensus Clustering: A Unified View," IEEE Transactions on Knowledge and Data Engineering, Vol.XXX, No.XXX, Dec. 2013.
[10] J. MacQueen, L. L. Cam and J. Neyman, Eds, “Some methods for classification and analysis of multivariate observations,” Statistics. University of California Press, Vol. 1, 1967.
[11] S. Chen, B. Mulgrew, and P. M. Grant, “A clustering technique for digital communications channel equalization using radial basis function networks,” World Academy of Science, Engineering and Technology Trans. Neural Networks, vol. 4, pp. 570–578, July 1993.
[12] Andritsos P. and Tzerpos V., “Information Theoretic Software Clustering,” IEEE Transactions on Software Engineering, vol. 31, no. 2, pp. 150-165, 2005.
[13] Asuncion A. and Newman D.J, “UCI Machine Learning Repository,” School of Information and Computer Science, University of California, 2007.
[14] Ayad H. and Kamel M., “Finding Natural Clusters Using Multi cluster Combiner Based on Shared Nearest Neighbours,” in Proceeding of International Workshop Multiple Classifier Systems, Guildford, pp. 166-175, 2003.
[15] Barbara D, Li Y, and Couto J., “COOLCAT: An Entropy-Based Algorithm for Categorical Clustering,” in Proceeding of The Eleventh International Conference on Information And Knowledge Management, Virginia, pp. 582-589, 2002.
[16] Boulis C. and Ostendorf M, “Combining Multiple Clustering Systems,” in Proceeding of European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, pp. 63-74, 2004.
[17] Cristofor D. and Simovici D., “Finding Median Partitions Using Information Theoretical Based Genetic Algorithms,” Journal of Universal Computer Science, vol. 8, no. 2, pp. 153-172, 2002.
[18] Domeniconi C. and Al-Razgan M, “Weighted Cluster Ensembles: Methods and Analysis,” ACM Transaction on. Knowledge Discovery Data, vol. 2, no. 4, pp. 1-40, 2009.
[19] Fern X. and Brodley C., “Solving Cluster Ensemble Problems by Bipartite Graph Partitioning,” in Proceeding of International Conference on Machine Learning, Banff, pp. 36-43, 2004.
[20] Fern X. and Brodley C., “Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach,” in Proceeding of International Conference on Machine Learning, Washington, pp. 186-193, 2003.
[21] Yun Yang, Ke Chen,” Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations”, Transactions on Knowledge and Data Engineering 23(2):307 – 320
[22] M. Halkidi, Y. Batistakis, and M. Varzirgiannis, “On Clustering Validation Techniques,” J. Intelligent Information Systems, vol. 17, pp. 107-145, 2001.
[23] A. Strehl and J. Ghosh, “Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions,” J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[24] A. Fred and A. Jain, “Combining Multiple Clusterings Using Evidence Accumulation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6 pp. 835-850, June 2005.
[25] E. Keogh, Temporal Data Mining Benchmarks, http://www.cs. ucr.edu/~eamonn/time_series_data, 2010.
[26] Patrick K¨othur, Mike Sips, Henryk Dobslaw, and Doris Dransch, "Visual Analytics for Comparison of Ocean Model Output with Reference Data: Detecting and Analyzing Geophysical Processes Using Clustering Ensembles," IEEE Transactions On Visualization And Computer Graphics, Vol. 20, No. 12, Dec. 2014.
[27] R. W. Lucky, “Automatic equalization for digital communication,” Bell Syst. Tech. J., vol. 44, no. 4, pp. 547–588, Apr. 1965.
[28] S. P. Bingulac, “On the compatibility of adaptive controllers (Published Conference Proceedings style),” in Proc. 4th Annu. Allerton Conf. Circuits and Systems Theory, New York, 1994, pp. 8–16.
[29] G. R. Faulhaber, “Design of service systems with priority reservation,” in Conf. Rec. 1995 World Academy of Science, Engineering and Technology Int. Conf. Communications, pp. 3–8.
[30] W. D. Doyle, “Magnetization reversal in films with biaxial anisotropy,” in 1987 Proc. INTERMAG Conf., pp. 2.2-1–2.2-6.
[31] Zhu X, Li J, Li H-D, Xie M and Wang J,”Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell”, Front. Genet. 11:604790. doi: 10.3389/fgene.2020.604790,2020.
[32] Jiaxuan Zhao and Suqin Ji, “Clustering ensemble of massive high dimensional data based on BLB and stratified sampling framework”, CAIH2020: 2020 Conference on Artificial Intelligence and Healthcare, Pages 154–160,2020
[33] Z. Yu, D. Wang, X. -B. Meng and C. L. P. Chen, "Clustering Ensemble Based on Hybrid Multiview Clustering," in IEEE Transactions on Cybernetics, doi: 10.1109/TCYB.2020.3034157.
[34] Malihe Danesh, “Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning”, The Journal of Supercomputing,2021
[35] Yuvaraj, N., Suresh Ghana Dhas, C. High-performance link-based cluster ensemble approach for categorical data clustering. J Supercomput 76, 4556–4579 (2020). https://doi.org/10.1007/s11227-018-2526-z
[36] Tianshu Yang, Nicolas Pasquier, Frederic Precioso, “Ensemble Clustering based Semi-supervised Learning for Revenue Accounting Workflow Management”, 9th International Conference on Data Science, Technology and Applications,2020
[37] Arko Banerjee, Arun K. Pujari, Chhabi Rani Panigrahi, Bibudhendu Pati, Suvendu Chandan Nayak & Tien-Hsiung Weng, “A new method for weighted ensemble clustering and coupled ensemble selection”, Connection Science, DOI: 10.1080/09540091.2020.1866496,2021
[38] Xue H, Chen S, Yang Q,”Discriminatively regularized least-squares classification”, Pattern Recognition, 42(1):93–104,2009
[39] Yang X-S,”Firefly algorithms for multimodal optimization. Stochastic algorithms: foundations and applications”, Springer, pp 169–178,2009
[40] Zaki JM, Peters M, “CLICKS: mining subspace clusters in categorical data via K-partite maximal cliques”, In: 21st International Conference on Data Engineering, IEEE Proceedings, pp 355–356,2005
[41] Iam-On N, Boongeon T, Garrett S, Price C,”A link-based cluster ensemble approach for categorical data clustering”, IEEE Trans Knowl Data Eng 24(3):413–425,2012.
[42] Mohammad Reza Mahmoudi, Hamidreza Akbarzadeh, Hamid Parvin, Samad Nejatian, Vahideh Rezaie, Hamid Alinejad-Rokny, “Consensus function based on cluster-wise two level clustering”, Artificial Intelligence Review,2021.
[43] Huan Niu, Nasim Khozouie, Hamid Parvin, Hamid Alinejad-Rokny, Amin Beheshti,Mohammad Reza Mahmoudi, “An Ensemble of Locally Reliable Cluster Solutions”, Applied Sciences, 2020.
[44] Khoshnevisan, B, Rafiee, S, Omid, M, Mousazadeh, H, Shamshirband, S, Hamid, S.H.A, “Developing a fuzzy clustering model for better energy use in farm management systems” , Renew. Sustain. Energy Rev.2015.
[45] Bagherinia, A, Minaei-Bidgoli, B, Hossinzadeh, M, Parvin, H, “Elite fuzzy clustering ensemble based on clustering diversity and quality measures”, Appl. Intell. 2019.
[46] Tianshu Yang, Nicolas Pasquier, Frederic Precioso, “Ensemble Clustering based Semi-supervised Learning for Revenue Accounting Workflow Management”, Data, 2020.
[47] A. B. Peterson, L. Xu, J. Daugherty, and M. J. Breiding, ‘‘Surveillance report of traumatic brain injury-related emergency department visits, hospitalizations, and deaths, United States, 2014,’’ Center Disease Control Prevention, Atlanta, GA, USA, Tech. Rep., 2019.
[48] A. J. Masino and K. A. Folweiler, ‘‘Unsupervised learning with GLRM feature selection reveals novel traumatic brain injury phenotypes,’’ , arXiv:1812.00030. (Online). Available: http://arxiv.org/abs/1812.00030 , 2018.
[49] K. Al-Jabery, T. Obafemi-Ajayi, G. Olbricht, and D. Wunsch, Computational Learning Approaches to Data Analytics in Biomedical Applications. New York, NY, USA: Academic, 2019.
[50] U. von Luxburg, ‘‘A tutorial on spectral clustering,’’ Statist. Comput., vol. 17, no. 4, pp. 395–416, Dec. 2007.
[51] J. Shi, J. Malik, ‘‘Normalized cuts and image segmentation”,IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888– 905,2000.
[52] H. Zeng and Y.-M. Cheung, ‘‘Iterative feature selection in Gaussian mixture clustering with automatic model selection,’’ in Proc. Int. Joint Conf.Neural Netw., pp. 2277–2282, 2007.
[53] Dacosta Yeboah, Louis Steinmeister, Daniel B. Hier, Bassam Hadi, Donald C. Wunsch II, Gayla R. Olbricht,Tayo Obafemi-Ajayi, “An Explainable and Statistically Validated Ensemble Clustering Model Applied to the Identification of Traumatic Brain Injury Subgroups”, IEEE Access, 2020.