The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.<\/p>\r\n","references":"[1]\tG. Gan, C. Ma, J. Wu, Data clustering: Theory, algorithms, and applications, ASA-SIAM series on statistics and applied probability, SIAM, Philadelphia: McGraw-Hill, 2007.\r\n[2]\tN. Abdelhamid, A. Ayesh, F. Thabtah, \u201cPhishing detection based associative classification data mining,\u201d Expert Syst. Appl., vol.41, no.13, pp. 5948\u20135959, 2014.\r\n[3]\tO.A. Abbas, \u201cComparisons between data clustering algorithms,\u201d Int. Arab J. Inform. Technol., vol.5, no.3, pp. 320\u2013325, 2008.\r\n[4]\tF. H\u00f6ppner, F. Klawonn, R. Kruse, T. Runkler, Fuzzy cluster analysis: methods for classification, data analysis and image recognition, New York: John Wiley & Sons, USA, 1999.\r\n[5]\tE. Boundaillier, G. Hebrail, \u201cInteractive interpretation of hierarchical clustering,\u201d Intell. Data Anal., vol.2, no.1, pp. 229\u2013244, 1997.\r\n[6]\tE. Forgy, \u201cCluster analysis of multivariate data: efficiency vs. interpretability of classification,\u201d Biometrics, vol.21, no.3, pp. 41\u201352, 1965.\r\n[7]\tJ.B. Macqueen, \u201cSome methods for classification and analysis of multivariate observation,\u201d in Proc. of Berkeley Symposium on Mathematical Statistics and Probability, California, 1967, pp. 281\u2013297.\r\n[8]\tT. Gonzalez, \u201cClustering to minimize the maximum intercluster distance,\u201d Theor. Comput. Sci., vol.38, no.2, pp. 293\u2013306, 1985.\r\n[9]\tShehroz S. Khan, Amir Ahmad, \u201cCluster center initialization algorithm for k-means clustering,\u201d Pattern Recognition Letters, vol.25, no.11, pp. 1293\u20131302, 2004.\r\n[10]\tRedmond, S. J., & Heneghan, C. \u201cA method for initialising the k-means clustering algorithm using kd-trees,\u201d Pattern Recognition Letters, vol.28, no.8, pp. 965\u2013973, 2007.\r\n[11]\tArthur, D., & Vassilvitskii, S. \u201ck-means++: The advantages of careful seeding,\u201d in: Proc. of the 18th annual ACM-SIAM symposium on discrete algorithms, New Orleans, Louisiana, 2007, pp.1027\u20131035.\r\n[12]\tCao, F., Liang, J., & Bai, L. \u201cA new initialization method for categorical data clustering,\u201d Expert Syst. Appl., vol.36, no.7, pp.10223\u201310228, 2009.\r\n[13]\tDamodar Reddy, Devender Mishra, Prasanta K. Jana, MST-based cluster initialization for k-means, Berlin: Springer-verlag, 2010, pp.329-338.\r\n[14]\tLan Huang, Shixian Du, Yu Zhang, Yaolong Ju, Zhuo Li, \u201cK-means initial clustering center optimal algorithm based on kruskal,\u201d Journal of Information & Computational Science, vol.9, no.9, pp.2387\u20132392, 2012.\r\n[15]\tM. Mahajan, P. Nimbor, K. Varadarajan, \u201cThe planar k-means problem is NP-hard,\u201d Theoretical Computer Science, vol.442, no.8, pp.13\u201321, 2012.\r\n[16]\tZahn, C.T. \u201cGraph-theoretical methods for detecting and describing gestalt clusters,\u201d IEEE Trans. on Computers, vol.20, no.1, pp.68\u201386, 1971.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 121, 2017"}