Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Initializing K-Means using Genetic Algorithms

Authors: Bashar Al-Shboul, Sung-Hyon Myaeng

Abstract:

K-Means (KM) is considered one of the major algorithms widely used in clustering. However, it still has some problems, and one of them is in its initialization step where it is normally done randomly. Another problem for KM is that it converges to local minima. Genetic algorithms are one of the evolutionary algorithms inspired from nature and utilized in the field of clustering. In this paper, we propose two algorithms to solve the initialization problem, Genetic Algorithm Initializes KM (GAIK) and KM Initializes Genetic Algorithm (KIGA). To show the effectiveness and efficiency of our algorithms, a comparative study was done among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA), and FCM [19].

Keywords: Clustering, Genetic Algorithms, K-means.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080070

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1689

References:


[1] R.J. Bauer, "Genetic Algorithms and Investment Strategies", Wiley Finance Editions, New York, John Wiley and Sons, 1994.
[2] H. Ben Amor, and A. Rettinger, "Intelligent exploration for genetic algorithms: Using self-organizing maps in evolutionary computation", In proceedings of the 2005 conference on Genetic and evolutionary computation, 2005, pp. 1531-1538.
[3] P. Bradley, and U. Fayyad, "Refining Initial Points for K-Means Clustering," In Proceeding of 15th International Conference on Machine Learning, 1998, pp. 91-99.
[4] P. Chi, "Genetic Search with Proportion Estimation", In proceedings of the Third Int. Con. on Genetic Algorithms (ICGA), San Mateo, California, 1989, pp. 92-97.
[5] U. Fayyad, C. Reina, and J. Bradley, "Initialization of iterative refinement clustering algorithms", In proceedings of Fourth Int. Con. on Knowledge Discovery and Data Mining, AAAI, pp.194-198, 1998.
[6] C. Fraley, and A. Raftery, "How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis," Technical Report 329, Dept. of Statistics, University of Washington, Seattle, 1998.
[7] D. Goldberg, "Computer-Aided Gas Pipeline Operation Using Genetic Algorithms And Rule Learning," Ph.D. thesis, University of Michigan, Ann Arbor, 1983.
[8] D. Goldberg, "Genetic Algorithms In Search, Optimization, And Machine Learning," Addison-Wesley, New York, 1989.
[9] J. Grefenstette, C. Ramsey, and A. Schultz, Learning Sequential Decision Rules Using Simulation Models And Competition, Machine Learning 5, Vol. 4, 1990, pp. 355-381.
[10] L. Hall, B. Ozyurt, and J. Bezdek, "Clustering With A Genetically Optimized Approach," IEEE Transactions on Evolutionary computation, Vol. 3, No. 2, 1999, pp 103-112.
[11] J. Han, and M. Kamber, "Data Mining: Concepts And Techniques," Morgan Kaufmann Publishers, 2001.
[12] J. Holland, "Adaptation In Natural And Artificial Systems," University of Michigan Press, 1975.
[13] L. Kaufman, and P. Rousseeuw, "Finding Groups In Data: And Introduction To Cluster Analysis," John Wiley and Sons, 1990.
[14] A. Korol, I. Preygel, and S. Preygel, "Algorithms Of Estimation And Population-Genetic Models," Central Library of Imperial College, Chapman and Hall, 1994.
[15] K. Krishna, and M. Murty, "Genetic K-Means Algorithm," IEEE Transactions on Systems, Vol. 29, NO. 3, 1999, pp. 433-439.
[16] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown, "FGKA: A Fast Genetic K-Means Clustering Algorithm," ACM Symposium on Applied Computing, 2004
[17] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown, "Incremental Genetic K-Means Algorithm And Its Application In Gene Expression Data Analysis," BMC Bioinformatics, 2004.
[18] J. MacQueen, "Some Methods For Classification And Analysis Of Multivariate Observations," In proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967, pp. 281-297.
[19] U. Maulik, and S. Bandyopadhyay, "Genetic Algorithm-Based Clustering Technique", Pattern Recognition 33, 1999, pp. 1455-1465.
[20] J. Pena, J. Lozano, and P. Larranaga, "An Empirical Comparison of Four Initialization Methods For The K-Means Algorithm," Pattern Recognition Letters, Vol. 20 No. 10, 1999, pp. 1027-10.