Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31464
Analysis of Diverse Cluster Ensemble Techniques

Authors: S. Sarumathi, N. Shanthi, P. Ranjetha

Abstract:

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1111739

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1280

References:


[1] Sandro Vega-Pons and José Ruiz-Shulcloper, “A Survey of Clustering Ensemble Algorithms”, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 25, No. 3, pp.337-372, 2011.
[2] Hongjun Wang, Hanhuai Shan, Arindam Banerjee, “Bayesian Cluster Ensembles”, Statistical Analysis and Data Mining, 2011.
[3] A. Strehl and J. Ghosh, “Cluster ensembles - a knowledge reuse framework for combining multiple partitions”, JMLR, 3: pp.583–617, 2002.
[4] A. Topchy, A. Jain, and W. Punch, “A mixture model for clustering ensembles”, In SDM, pp. 379–390, 2004.111
[5] S. Sarumathi, N. Shanthi, M. Sharmila, “A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods”, International Journal of Computer, Information Science and Engineering Vol.7, No.12, 2013.
[6] Fern X. Z, Lin W, “Cluster ensemble selection”, Stat. Anal.Data Mining 1(3), pp. 128–141, 2008.
[7] Fred A.L, Jain A.K, “Combining multiple clusterings using evidence accumulation”, IEEE Trans. Pattern Anal. Mach. Intell. 27 (6), pp. 835–850, 2005.
[8] Hadjitodorov S.T, Kuncheva L.I, Todorova L.P, “Moderate diversity for better cluster ensembles”, Inf. Fusion 7 (3), pp.264–275, 2006.
[9] Hong Y, Kwong S, Wang H, Ren Q, “Resampling-based selective clustering ensembles”, Pattern Recognit. Lett.30 (3), pp.298–305, 2009.
[10] Jia J, Xiao X, Liu B, Jiao L, “Bagging-based spectral clustering ensemble selection”, Pattern Recognit. Lett.32 (10), pp.1456–1467, 2011.
[11] Mimaroglu S, Erdil E, “An efficient and scalable family of algorithms for combining clusterings”, Eng. Appl. Artif.Intell.26 (10), pp.2525–2539, 2013.
[12] A.Gionis, H.Mannil, P.Tsaparas, “Clustering aggregation”, In Proceedings of the 21st International Conference on Data Engineering (ICDE’05), pp. 341 – 352, 2005.
[13] X. Z. Fern and C. E. Brodley. “Solving cluster ensemble problems by bipartite graph partitioning”, In Proceedings of the Twenty First International Conference on Machine Learning, pp. 281–288, 2004.
[14] A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm”, In Advances in Neural Information Processing Systems 14, pp 849–856, 2002.
[15] X. Hu, I. Yoo, “Cluster ensemble and its applications in gene expression analysis”, in: Y.-P.P. Chen (Ed.), Proc. 2-nd Asia-Pacific Bioinformatics Conference (APB2004), Dunedin, New Zealand, pp. 297–302, 2004.
[16] L. Hubert, P. Arabie, Comparing partitions, Journal of Classification 2, pp.193–218, 1985.
[17] D. Greene, A. Tsymbal, N. Bolshakova, P. Cunningham, “Ensemble clustering in medical diagnostics”, in: R. Long et al. (Eds.), Proc. 17th IEEE Symp. on Computer-Based Medical Systems CBMS_2004, Bethesda, MD, National Library of Medicine/National Institutes of Health, IEEE CS Press, pp. 576– 581, 2004.
[18] Li Zheng, Tao Li, Chris Ding, “A Framework for Hierarchical Ensemble Clustering”, ACM Transactions on Knowledge Discovery from Data, Vol. 9, No. 2, Article 9, 2014.
[19] X. Z. Fern and C. E. Brodley, “Random projection for high dimensional data clustering: A cluster ensemble approach”, In Proceedings of the Twentieth International Conference on Machine Learning, pp. 186– 193, 2003.
[20] D. Greene, A. Tsymbal, N. Bolshakova, P. Cunningham, “Ensemble clustering in medical diagnostics”, in: R. Long et al. (Eds.), Proc. 17th IEEE Symp. on Computer-Based Medical Systems CBMS_2004, Bethesda, MD, National Library of Medicine/National Institutes of Health, IEEE CS Press, pp. 576–581, 2004.
[21] A. Strehl, J. Ghosh, “Cluster ensembles—a knowledge reuse framework for combining partitionings”, in: Proc. of 11-thNational Conf. on Artificial Intelligence, NCAI, Edmonton, Alberta, Canada, pp. 93–98, 2002.
[22] A. Topchy, A.K. Jain, W. Punch, “Combining multiple weak clusterings”, in: Proceedings of IEEE Int. Conf. on Data Mining, Melbourne, Australia, pp. 331–338, 2003.
[23] Dudoit S, Fridlyand J, “Bagging to improve the accuracy of a clustering procedure”, Bioinformatics 19 (9), 2003.
[24] Fern X.Z, Brodley C.E, “Random projection for high dimensional data clustering: A cluster ensemble approach”, In: Proc. 20th Internat.Conf. Machine Learning, vol. 20, pp. 186–191, 2003.
[25] Fischer B, Buhmann J.M, “Bagging for path-based clustering”, IEEE Trans. Pattern Anal. Machine Intell.25 (11), pp. 1411–1415, 2003.
[26] Minaei-Bidgoli B, Topchy A, Punch W.F, “A comparison of resampling methods for clustering ensembles”, In: Internat.Conf. on Machine Learning, Models, Technologies and Applications (MLMTA 2004), pp. 939–945, 2004a.
[27] Minaei-Bidgoli B, Topchy A, & Punch W. F, “Ensembles of partitions via data resampling”, In: Proc. Internat. Conf.on Information Technology: Coding and Computing (ITCC’04), vol. 2, pp. 188–192, 2004b.
[28] Fowlkes C, Belongie S, Chung F, Malik, J,. Spectral, “Grouping using the Nyström method”, IEEE Trans. Pattern Anal. Machine Intell.26 (2), pp. 214– 225, 2004.
[29] Mimaroglu S, Erdil E, “Asod: arbitrary shape object detection”, Engineering Applications of Artificial Intelligence 24, pp. 1295–1299, 2011a.
[30] Mimaroglu S, Erdil E, “Combining multiple clusterings using similarity graph”, Pattern Recognition 44, pp. 694–703, 2011b.
[31] Iam-on N, Boongoen T, Garrett S, “LCE: a link-based cluster ensemble method for improved gene expression data analysis”, Bioinformatics 26, pp. 1513–1519, 2010.
[32] Karypis G, Kumar V, “A fast and high quality multilevel scheme for partitioning irregular graphs”, SIAM Journal on Scientific Computing 20, 359, 1999.
[33] Karypis G, Aggarwal R, Kumar V, Shekhar S, “Multilevel hypergraph partitioning: application in VLSI domain”, In: Proceedings of the 34th Annual Conference on Design automation, ACM New York, NY, USA, pp. 526–529, 1997.
[34] Hore P, Hall L, Goldgof D, “A scalable framework for cluster ensemble”, Pattern Recognition 42, pp. 676–688, 2009.
[35] Thomas M. Cover and Joy A. Thomas, “Elements of Information Theory”, Wiley, 1991.
[36] J. Podan, “Simulation of random dendrograms and comparison tests: Some comment”, Journal of Classification 17, pp.123–142, 2000.
[37] P.N. Tan, M. Steinbach, V. Kumar, “Introduction to Data Mining (1st ed.)”, Addison-Wesley Longman, Boston, MA, 2005.
[38] J. Wu, H. Xiong, J. Chen, “Towards understanding hierarchical clustering: A data distribution perspective”, Neuro computing, 72, pp. 10–12, 2319–2330, 2009.
[39] Y. Zhao and G. Karypis, “Evaluation of hierarchical clustering algorithms for document datasets”, In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM’02), ACM, New York, NY, pp. 515–524, 2002.
[40] L. Zheng, T. Li, C. H. Q. Ding, “Hierarchical ensemble clustering”, In ICDM’10, pp.1199–1204, 2010.
[41] J. Azimi and X. Fern, “Adaptive cluster ensemble selection”, In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’09), pp. 992–997, 2009.
[42] E. N. Adams, “N-trees as nestings: Complexity, similarity, and consensus”, Journal of Classification 3, pp. 299–317, 1986.
[43] E. N. Adams III, “Consensus techniques and the comparison of taxonomic trees”, Systematic Zoology 21, 4, pp. 390–397, 1972.
[44] N. Ailon and M. Charikar, “Fitting tree metrics: Hierarchical clustering and phylogeny”, In Proceedings of the Symposium on Foundations of Computer Science, pp.73–82, 2005.