**Commenced**in January 2007

**Frequency:**Monthly

**Edition:**International

**Paper Count:**30184

##### A Genetic Algorithm for Clustering on Image Data

**Authors:**
Qin Ding,
Jim Gasvoda

**Abstract:**

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.

**Keywords:**
Clustering,
data mining,
genetic algorithm,
image data.

**Digital Object Identifier (DOI):**
doi.org/10.5281/zenodo.1081309

**References:**

[1] P. K. Agarwal and C. M. Procopiuc, "Exact and approximation algorithms for clustering," in Proceedings of the ninth annual ACMSIAM symposium on Discrete algorithms, 1998, pp. 658-667.

[2] P. J. Angeline, "Adaptive and self-adaptive evolutionary computations," Computational Intelligence: A Dynamic System Perspective, Piscataway, IEEE Press, 1995, pp. 152-163.

[3] A. Demiriz, K. P. Bennett, and M. J. Embrechts, "Semi-supervised clustering using genetic algorithms," R.P.I. Math Report No. 9901, Rensselaer Polytechnic Institute, 1999.

[4] W. DuMouchel, C. Volinsky, T. Johnson, C. Cortes, and D. Pregibon, "Squashing flat files flatter," in Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1999, pp. 6-15.

[5] V. Estivill-Castro and A.T. Murray, "Spatial clustering for data mining with genetic algorithms," in Proceedings of the International ICSC Symposium on Engineering of Intelligent Systems, 1998, pp. 317-323.

[6] J. Grabmeier and A. Rudolph, "Techniques of cluster algorithms in data mining," Data Mining and Knowledge Discover, 6, 2002, pp. 303-360.

[7] L. O Hall, I. B. Ozyurt, , J. C. Bezdek, "Clustering with a genetically optimized approach," IEEE Transactions on Evolutionary Computation, 3(2), 1999, pp. 103-112.

[8] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2000.

[9] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACM Computing Surveys, 31(3), 1999, pp. 264-323.

[10] C.-Y. Lee and E. K.Antonsson, "Dynamic partitional clustering using evolution strategies," in Proceedings of the Third Asia Pacific Conference on Simulated Evolution and Learning, 2000.

[11] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, 1998.

[12] M. Painho and F. Ba├º├úo, "Using genetic algorithms in clustering problems," in Proceedings of GeoComputation Conference, 2000.