Simultaneous Clustering and Feature Selection Method for Gene Expression Data
Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar
Abstract:
Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.
Keywords: Clustering, Feature selection, Gene expression data, Quick reduct.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1336220
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1974References:
[1] Chen Zhang and Shixiong Xia, 2009 " K-Means Clustering Algorithm with Improved Initial center,” in Second International Workshop on Knowledge Discovery and Data Mining (WKDD), pp. 790-792.
[2] Chris Ding and Hanchuna Peng, "Minimum Redundancy Feature Selection from Microarray Gene Expression Data”, proceedings of the International Bioinformatic Conference, Date on 11-14, August – 2003.
[3] Dongxiao Zhu, Alfred O Hero, Hong Cheng, Ritu Khanna and Anand Swaroop, "Network constrained clustering for gene microarray Data”, doi:10.1093 /bioinformatics / bti 655, Vol. 21 no. 21, pp. 4014 – 4020, 2005.
[4] Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham, "Weka: Practical Machine Learning Tools and Techniques with Java Implementations”.
[5] Kohei Arai and Ali Ridho Barakbah, " Hierarchical K-Means: an algorithm for centroids initialization for K-Means”, Reports of the Faculty of Science and Engineering, Saga University, Vol. 36, No.1, 25-31, 2007.
[6] K.Thangavel, P. Jaganathan, A. Pethalakshmi, M.Karnan,"Effective Classification with Improved Quick Reduct For Medical Database Using Rough System”, BIME Journal, Volume (05), Issue (1), Dec., 2005.
[7] K. Thangavel, A. Pethalakshmi,” Feature Selection for Medical Database Using Rough System”, AIML Journal, Volume (6), Issue (1), January, 2006.
[8] K.R De and A. Bhattacharya, "Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying Patterns in expression profiles,” bioinformatics, Vol. 24, pp.1359- 1366, 2008.
[9] Madhu Yedla, Srinivasa Rao Pathakota, T. M. Srinivasa, 2010 "Enhancing K-Means Clustering Algorithm with Improved Initial Center”, Madhu Yedla et al. / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 1 (2), pp121-125.
[10] Pawlak, Z. (2002) ‘Rough Sets and Intelligent Data Analysis’, Information Sciences, Vol. 147, pp. 1–12.
[11] Pradipta Maji and Sankar K. Pal, "Fuzzy–rough sets for information measures and Selection of relevant genes from microarray data”, IEEE transactions on systems, man, and cybernetics—part b: cybernetics, vol. 40, no. 3, June 2010.
[12] QiangShen, Alexios Chouchoulas, "A Rough Fuzzy Approach For Generating Classification Rules”,ww.elsevier.com/locate/patcog, Pattern Recognition 35 (2002) 2425 – 2438.
[13] R.Rathipriya, Dr. K.Thangavel and J.Bagyamani, "Evolutionary Biclustering of Clickstream Data”, International Journal of Computer Science Issues, Vol. 8, Issue 3, No 1, May 2011. ISSN (Online): 1694-0814.
[14] Sauravjoyti Sarmah and Dhruba K. Bhattacharyya. "An Effective Technique for Clustering Incremental Gene Expression data”, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3, May 2010.
[15] Sunnyvale, Schena M. "Microarray biochip technology”. CA: Eaton Publishing; 2000.
[16] T.Chandrasekhar, K.Thangavel and E.N.Sathishkumar, "Verdict Accuracy of Quick Reduct Algorithm using Clustering and Classification Techniques for Gene Expression Data”, International Journal of Computer Science Issues, Vol. 9, Issue 1, No 1, January 2012. ISSN (Online): 1694-0814.
[17] Witten, I. H., and Frank E. (1999) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, San Francisco.
[18] Xiaosheng Wang, Osamu Gotoh, "Cancer Classification Using Single Genes”, pp 179-188.