Billel Kenidra and Mohamed Benmohammed
An Improved KMeans Algorithm for Gene Expression Data Clustering
497 - 503
2018
12
7
International Journal of Computer and Information Engineering
https://publications.waset.org/pdf/10009173
https://publications.waset.org/vol/139
World Academy of Science, Engineering and Technology
Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The KMeans algorithm is one of the most widely used partitional clustering techniques. Since KMeans is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate KMeans centers. The improved KMeans algorithm is compared with the original KMeans, and the results prove how the efficiency has been significantly improved.
Open Science Index 139, 2018