Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method
Authors: P. Ashok, G. M. Kadhar Nawaz
Abstract:
Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.
Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1111727
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1417References:
[1] P. Ashok, G.M Kadhar Nawaz, E. Elayaraja, “Outliers detection on protein localization sites by Partitional clustering methods”, In Proc. International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME), Salem, Feb 2013, pp. 447 – 453
[2] P. Ashok, G.M Kadhar Nawaz , V. Vadivel , “Improved Performance of Unsupervised Method by Renovated K-Means”, IJASCSE, Vol 2, no 1, pp. 41-47, 2013.
[3] D.L Davies, D.W Bouldin, “A cluster separation measure”. IEEE Trans.Pattern Anal. Machine Intell, vol. 1, no 4, pp. 224-227, 2000
[4] Georg Peters, “Some refinements of rough k-means clustering”, Pattern Recognition, vol. 39, pp. 1481 – 1491, 2006
[5] Kevin E. Voges, “Research Techniques Derived from Rough Sets Theory: Rough Classification and Rough Clustering”, 4th European Conference on Research Methodology for Business and Management Studies, April 2005, pp. 437- 444.
[6] P. Lingras, “Rough Set Clustering for Web Mining”, In Proc. IEEE International Conference on Fuzzy Systems. May 2002, pp. 5-16.
[7] P. Lingras, C. West, “Interval set clustering of web users with rough K-means”, J. Intell. Inform. Syst., vol. 23, pp. 5–16, 2004.
[8] Z. Pawlak, Rough Sets-Theoretical Aspects of reasoning about Data, Kluwer Academic Publisher, Dordrecht, 1991, pp. 229-243.
[9] Z. Pawlak, “Concurrent Versus Sequential, The Rough Sets Perspective”, Bulletin of the EATCS, vol. 48, pp. 178-190, 1992.
[10] Z. Pawlak, “Rough sets”, International Journal of Computer and Information Sciences, vol 11, no 5, pp: 341-356, 1982.
[11] W.M Rand, “Objective criteria for the evaluation of clustering methods”, Journal of the American Statistical Association, vol. 66, no 336, pp. 846-850, 1971.
[12] Sauravjoyti Sarmah and Dhruba K. Bhattacharyya, “An Effective Technique for Clustering Incremental Gene Expression data”, International Journal of Computer Science Issues, vol. 7, no 3, pp. 31-40, 2010.
[13] K. Thangadurai, “A Study on Rough Clustering”. Global Journal of Computer Science and Technology, vol. 10, no 5, pp. 55-58, 2010.