A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection

E. N. Sathishkumar; K. Thangavel; T. Chandrasekhar

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection

Authors: E. N. Sathishkumar, K. Thangavel, T. Chandrasekhar

Abstract:

Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.

Keywords: Clustering, Gene Selection, K-Mean-Quick Reduct, Rough Sets.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1335980

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2299

References:

[1] C. Velayutham and K. Thangavel, "Unsupervised Quick Reduct Algorithm using Rough Set Theory”, Journal of Electronic Science and Technology, vol. 9, no. 3, Sep 2011.
[2] Dash et al., "A hybridized K-means clustering approach for high dimensional dataset” International Journal of Engineering, Science and Technology, Vol. 2, No. 2, 2010, pp. 59-66.
[3] Furey, T.-S., Cristianini, N., Duffy, N., Bednarski, D.-W., Schummer, M., and Haussler, D., Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16(10):906-914, 2000.
[4] Geman, D., d'Avignon, C., Naiman, D.-Q., and Winslow, R.-L., Classifying gene expression profiles from pairwise mRNA comparisons, Stat Appl Genet Mol Biol 3:Article19, 2004.
[5] Golub, T.-R., Slonim, D.-K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov J.P., Coller, H., Loh M.-L., Downing, J.-R., Caligiuri, M.-A., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286(5439):531-537, 1999.
[6] Lamba, J.-K., Pounds, S., Cao, X., Downing J.-R., Campana, D., Ribeiro, R.-C., Pui, C.H., and Rubnitz, J.-E., Coding polymorphisms in CD33 and response to gemtuzumab ozogamicin in pediatric patients with AML: a pilot study. Leukemia, 23(2):402-404, 2009.
[7] Lijun Sun, Duoqian Miao and Hongyun Zhang,” Gene Selection with Rough Sets for Cancer Classification”, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), IEEE: 0-7695-2874-0/07.
[8] Li, D., and Zhang, W., Gene selection using rough set theory, Proc. 1st International Conference on Rough Sets and Knowledge Technology, 778–785, 2006.
[9] Li, J., and Wong, L., Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns, Bioinformatics, 18(5):725-734, 2002.
[10] Momin, B.-F., and Mitra, S., Reduct generation and classification of gene expression data, Proc. 1st International Conference on Hybrid Information Technology, 699-708, 2006. Sun, L., Miao, D., and Zhang, H., Efficient gene selection with rough sets from gene expression data, Proc. 3rd International Conference on Rough Sets and Knowledge Technology, 164–171, 2008.
[11] M.Dashand H. Liu, "Feature selection for classification”, Intelligent Data Analysis, vol. 1, no. 3, pp. 131–156, 1997.
[12] R. Rathipriya, Dr. K. Thangavel and J. Bagyamani, "Evolutionary Biclustering of Clickstream Data”, International Journal of Computer Science Issues, Vol. 8, Issue 3, No 1, May 2011. ISSN (Online): 1694-0814.
[13] Tan, A.-C., and Gilbert, D., Ensemble machine learning on gene expression data for cancer classification, Appl Bioinformatics, 2(3 Suppl):S75-83, 2003.
[14] T.Chandrasekhar, K.Thangavel and E.N.Sathishkumar, "Verdict Accuracy of Quick Reduct Algorithm using Clustering and Classification Techniques for Gene Expression Data”, International Journal of Computer Science Issues, Vol. 9, Issue 1, No 1, January 2012. ISSN (Online): 1694-0814.
[15] Xiaosheng Wang and Osamu Gotoh, "Cancer Classification Using Single Genes”, pp 179-188, Available: www.jsbi.org/pdfs/journal1/ GIW09/GIW09017.pdf
[16] Xu R. and Wunsch D., 2005. "Survey of clustering algorithms”, IEEE Trans. Neural Networks, Vol. 16, No. 3, pp. 645-678.
[17] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Dordrecht: Kluwer Academic Publishers, 1991.
[18] Madhu Yedla, Srinivasa Rao Pathakota, T M Srinivasa , "Enhancing K-Means Clustering Algorithm with Improved Initial Center” , Madhu Yedla et al. / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 1 (2), pp121-125, 2010.
[19] T. R. Golub et.al, "Molecular Classification of cancer: Class Discovery and Class Prediction by Gene Expression Monitoring”, www.sciencemag.org, vol. 286, Oct.1999