{"title":"A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database","authors":"Semeh Ben Salem, Sami Naouali, Moetez Sallami","volume":126,"journal":"International Journal of Computer and Systems Engineering","pagesStart":702,"pagesEnd":708,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/10007220","abstract":"
The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k<\/em>-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k<\/em>-means version.<\/p>\r\n","references":"[1]\tDe Bruin, J. S, Cocx, T. K, Kosters, W. A, Laros, \u201cData Mining approaches to criminal career analysis.\u201d In Proceedings of the 6th International Conference on Data Mining ICDM\u201906, pp 11-18, 2006.\r\n[2]\tT. Abraham and O. de Vel, \u201cInvestigating profiling with computer forensic log data and associations rules.\u201d Proceedings of the IEEE International Conference on Data Mining (ICDM\u201906), pp 11-18, 2006.\r\n[3]\tJiawei Han M. K, \u201cData Mining concepts and techniques.\u201d Morgan Kaufmann Publishers, An Imprint of Elsevier, 2006.\r\n[4]\tHuang Z, \u201cExtension to the k-means algorithm for clustering large datasets with categorical values\u201d, Data Mining and Knowledge Discovery, (2):283-304, 1998.\r\n[5]\tAmir Ahmad, Lipika Dey, \u201cA k-means clustering algorithm for mixed numeric and categorical data.\u201d Data and Knowledge Engineering 63, pp 503-527, 2007.\r\n[6]\tV. Ganti, J. E Gekhre, R. Ramakrishnan, \u201cCACTUS clustering categorical data using summaries\u201d, Proceedings of the 5th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, 1999, pp 73-83.\r\n[7]\tT. Zhang, R. Ramakrishnan, M. Livny, \u201cBIRCH: an efficient data clustering method for very large databases.\u201d SIGMOD Conference, 1996, pp 130-114.\r\n[8]\tDong Kuan Xu, Yingjie Tian, \u201cA Comprehensive Survey of Clustering Algorithms\u201d, Ann. Data. Sci. , Springer-Verlag Berlin Heidelberg 2015, DOI 10.1007\/s40745-015-0040-1\r\n[9]\tCelebi M E, Kingravi H A Vela P A, \u201cA comparative study of efficient initialization methods for the k-means clustering algorithm\u201d. Expert Systems with Applications 40:200\u2013210, 2013.\r\n[10]\tCelebi M E, Kingravi H, \u201cDeterministic initialization of the K-means algorithm using hierarchical clustering\u201d, International Journal of Pattern Recognition and Artificial Intelligence 26(7):1250018, 2012.\r\n[11]\tCelebi M E, Kingravi H, \u201cLinear, deterministic, and order-invariant initialization methods for the K-means clustering algorithm.\u2019\u2019 Celebi M E (ed) Partitional clustering algorithms. Springer, Berlin, pp 79\u201398, 2014.\r\n[12]\tKalogeratos A, Likas A, \u201cDip-means: an incremental clustering method for estimating the number of clusters.\u201d In: Advances in neural information processing systems (NIPS), pp 2402\u20132410, 2012.\r\n[13]\tTzortzis G, Likas A, \u201cThe Min-Max k-Means clustering algorithm\u201d. Pattern Recognition 47:2505\u20132516-2014.\r\n[14]\tEslamnezhad M, Varjani A Y, \u201cIntrusion detection based on Min-Max K-means clustering.\u201d In 7th International symposium on telecommunications (IST\u20192014), pp 804\u2013808-2014.\r\n[15]\tYuan F, Meng Z. H, Zhang H, X and Dong C. R, \u201cA new algorithm to get the initial centroids.\u201d Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, pages 26-29, 2004.\r\n[16]\tXiaoyan Wang, Yanping Bai, \u201cThe global Min-Max k means algorithm\u201d, Wang and Bai SpringerPlus 5:1665, DOI 10.1186\/s40064 016 3329 4-2016.\r\n[17]\tZengyou He, Shengchun Deng \u201cImproving K-modes Algorithm considering frequencies of attributes values in mode.\u201d Conference paper in Lecture notes in computer science, December 2005.\r\n[18]\tG. La Free, \u201cThe Global Terrorism Database: Accomplishments and Challenges\u201d, Perspectives on Terrorism, Vol. 4 (2010).\r\n[19]\tX. Wang, E. Miller, K. Smarick, W. Ribarsky and R. Chang, \u201cInvestigative Visual Analysis of Global terrorism.\u201d, Proceeding of the 10th Joint Eurographics\/ IEEE-VGTC conference on Visualization, Vol. 27 (2008): 919-926.\r\n[20]\tM. Adnan, M. Rafi, \u201cExtracting patterns from Global Terrorism Database (GTD) sing co-clustering approach.\u201d Journal of independent studies and research computing, Volume 13, 2015.\r\n[21]\tSemeh Ben Salem and Sami Naouali, \u201cPattern Recognition Approach in Multidimensional Databases: Application to the Global Terrorism Database\u201d International Journal of Advanced Computer Science and Applications (IJACSA), 7(8), 2016.\r\n[22]\tSilke Wagner, Dorothea Wagner, \u201cComparing Clusterings-An Overview\u201d, January 12, 2007.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 126, 2017"}