Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30174
A Survey: Clustering Ensembles Techniques

Authors: Reza Ghaemi , Md. Nasir Sulaiman , Hamidah Ibrahim , Norwati Mustapha

Abstract:

The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.

Keywords: Clustering Ensembles, Combinational Algorithm, Consensus Function, Unsupervised Classification.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1329276

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2812

References:


[1] A. Topchy, A. K. Jain and W. Punch, "Clustering ensembles: Models of consensus and weak partitions," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[2] A. Topchy, A. K. Jain and W. Punch, "A mixture model for clustering ensembles," Proceedings of the SIAM International Conference on Data Mining, Michigan State University, USA, 2004.
[3] S. Dudoit and J. Fridlyand, "Bagging to improve the accuracy of a clustering procedure," Bioinformatics oxford university, vol. 19, no. 9, pp. 1090-1099, Nov. 2003.
[4] A. L. N. Fred, "Finding consistent cluster in data partitions," Springer- Verlag Berlin Heidelberg, MCS, pp. 309-318, 2001.
[5] A. L. N. Fred and A. K. Jain, "Data clustering using evidence accumulation," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 835-850, 2002.
[6] B. Fischer and J. M. Buhmann, "Path-based clustering for grouping of smooth curves and texture segmentation," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 25, no.4, Apr. 2003.
[7] Y. Qian and C. Suen, "Clustering combination method," Proceeding International Conference Pattern Recognition, vol. 2, 2000.
[8] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research, pp.583-617, Feb. 2002.
[9] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transaction on Neural Networks, vol. 16, no. 3, May 2005.
[10] X. Z. Fern and C. E. Brodley, "Random Projection for high dimensional data clustering: A cluster ensemble approach," Proceedings of the 20th International Conference on Machine Learning (ICML), Washington DC., pp.186-193, 2003.
[11] W. Gablentz and M. Koppen, "Robust clustering by evolutionary computation," Proceeding Fifth Online World Conference Soft Computing in Industrial Applications (WSC5), 2000.
[12] P. Kellam, X. Liu, N. Martin, C. Orengo, S. Swift and A. Tucker, "Comparing, contrasting and combining clusters in viral gene expression data," Proceedings of 6th Workshop on Intelligent Data Analysis, 2001.
[13] Y. C. Chiou and L. W. Lan, "Genetic clustering algorithms," EJOR European Journal of operational Research, vol. 135, pp. 413-427, Nov. 2001.
[14] A. K. Jain, M. N. Murty and P. Flynn, "Data clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep. 1999.
[15] B. Fischer and J. M. Buhmann, "Bagging for path-based clustering," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 25, no.11, Nov. 2003.
[16] Y. Hong, S. Kwong, Y. Chang and Q. Ren, "Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm," Pattern Recognition Society, vol. 41, no. 9, pp. 2742- 2756, Dec. 2008.
[17] J. Azimi, M. Mohammadi, A. Movaghar and M. Analoui, "Clustering ensembles using genetic algorithm," IEEE The international Workshop on computer Architecture for Machine perception and sensing, pp. 119-123, Sep. 2006.
[18] A. Topchy, A. K. Jain and W. Punch, "Combining multiple weak clusterings," Proceeding of the Third IEEE International Conference on Data Mining, 2003.
[19] H. Luo, F. Jing and X. Xie, "Combining multiple clusterings using information theory based genetic algorithm," IEEE International Conference on Computational Intelligence and Security, vol. 1, pp. 84-89, 2006.
[20] J. Azimi, M. Abdoos and M. Analoui, "A new efficient approach in clustering ensembles," IDEAL LNCS, vol. 4881, pp. 395-405, 2007.
[21] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse framework for combining partitionings," Proceeding of 11th National Conference on Artificial Intelligence, Alberta, Canada ,pp. 93 98, 2002.
[22] A. Topchy, B. Minaei Bidgoli, A. K. Jain and W. Punch, "Adaptive clustering ensembles," Proceeding International Conference on Pattern Recognition (ICPR), pp. 272-275, Cambridge, UK, 2004.
[23] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," Proceedings of the 21st International Conference on Machine Learning, Canada, 2004.
[24] A. Ng, M. Jordan and Y. Weiss, "On spectral clustering: Analysis and an algorithm," NIPS 14, 2002.
[25] G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM Journal on Scientific Computing, pp. 359-392, 1998.
[26] M. Analoui and N. Sadighian, "Solving cluster ensemble problems by correlation-s matrix & GA," IFIP International Federation for Information Processing, vol. 228, pp. 227-231, 2006.