Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification
Authors: C. Gunavathi, K. Premalatha
Abstract:
Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.
Keywords: F-Test, Gene Expression, Genetic Algorithm, k- Nearest-Neighbor, Microarray, Signal-to-Noise Ratio, Support Vector Machine, T-statistics, Tumor Classification.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1096103
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4536References:
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531 – 537, 1999.
[2] E. Domany, "Cluster analysis of gene expression data,” J Stat Phys, vol. 110, pp. 1117-1139, 2003.
[3] D.E. Goldberg, Genetic Algorithms-in Search, Optimization and Machine Learning. London: Addison-Wesley Publishing Company Inc, 1989.
[4] J. Holland, Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975.
[5] T. Umpai and A. Stuart, "Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes,” BMC Bioinformatics, vol. 6, no. 148, 2005.
[6] S. Vanichayobon, W. Siriphan, and W. Wiphada, "Microarray Gene Selection Using Self-Organizing Map,” in Proceedings of the seventh WSEAS International Conference on Simulation, Modelling and Optimization, Beijing, China, 2007.
[7] X. Wang and O. Gotoh, "Accurate molecular classification of cancer using simple rules,” BMC Medical Genomics, vol. 2, no. 64, 2009.
[8] E. Martinez, M.A. Mario, and T. Victor, "Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm,” Computational Biology and Chemistry, vol. 34, pp. 244 – 250, 2010.
[9] P. Chopra, J. Lee, J. Kang, and S. Lee, "Improving Cancer Classification Accuracy Using Gene Pairs,” PLoS ONE, vol. 5, no. 12, 2010.
[10] H. Liu, L. Lei, and H. Zhang, "Ensemble gene selection for cancer classification,” Pattern Recognition, vol. 43, pp. 2763 – 2772, 2010.
[11] C. Li-Yeh, Y. Cheng-San, W. Kuo-Chuan, and Y. Cheng-Hong, "Gene selection and classification using Taguchi chaotic binary particle swarm optimization,” Expert Systems with Applications, vol. 38, pp. 13367 – 13377, 2011.
[12] O. Dagliyan, F. Uney-Yuksektepe, I.H. Kavakli, and M. Turkay, "Optimization Based Tumor Classification from Microarray Gene Expression Data,” PLoS ONE, vol. 6, no. 2, 2011.
[13] X. Wang and R. Simon,” Microarray-based cancer prediction using single Genes,” BMC Bioinformatics, vol. 12, no. 391, 2011.
[14] B. Chandra and M. Gupta, "An efficient statistical feature selection for classification of gene expression data,” Journal of Biomedical Informatics, vol. 44, pp. 529 – 535, 2011.
[15] I.H. Lee, H.L. Gerald, and V. Mahesh, "A filter-based feature selection approach for identifying potential biomarkers for lung cancer,” Journal of Clinical Bioinformatics, vol. 1, no. 11, 2011.
[16] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, B.B. Tan, B.C. Deng, and C.C. Lin, "Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1633 – 1641, 2011.
[17] D. Mishra and B. Sahu, "Feature Selection for Cancer Classification: A Signal-to-noise Ratio Approach,” International Journal of Scientific & Engineering Research, vol. 2, no. 4, 2011.
[18] H. Huang, J. Li, and J. Liu, "Gene expression data classification based on improved semi-supervised local Fisher discriminant analysis,” Expert Systems with Applications, vol. 39, pp. 2314 – 2320, 2012.
[19] G.C.J. Alonso, I.Q. Moro-Sancho, A. Simon-Hurtado, and R. Varela- Arrabal, "Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods,” Expert Systems with Applications, vol. 39, pp. 7270 –7280, 2012.
[20] C.Gunavathi and K.Premalatha, "A Comparative Analysis of Swarm Intelligence Techniques for Feature Selection in Cancer Classification” The Scientific World Journal, vol. 2014, Article ID 693831, http://dx.doi.org/10.1155/2014/693831.
[21] A. Sharma, I. Seiya, and M. Satoru, "A Top-R Feature Selection Algorithm For Microarray Gene Expression Data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 754 – 764, 2012.
[22] K. Yendrapalli, R. Basnet, S. Mukkamala, and A.H. Sung, "Gene Selection for Tumor Classification Using Microarray Gene Expression Data,” in Proceedings of the World Congress on Engineering, vol. I, 2007.
[23] X. Momiao, W. Li, J. Zhao, J. Li, and B. Eric, "Feature (Gene) Selection in Gene Expression-Based Tumor Classification,” Journal of Molecular Genetics and Metabolism, vol. 73, pp. 239–247, 2001.
[24] C. Cortes and V. Vapnik, "Support-vector networks,” Mach Learn, vol. 20, no. 3, pp.273–297, 1995.
[25] M.S. Mohamed, D. Safaai, and R.O. Muhammad, "Genetic Algorithms wrapper approach to select informative genes for gene expression microarray classification using support vector machines,” in InCoB'04: Proceedings of Third International Conference on Bioinformatics, Auckland, New Zealand, 2004.
[26] N.S. Altman, "An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175-185, 1992.