Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30319
Statistical Measures and Optimization Algorithms for Gene Selection in Lung and Ovarian Tumor

Authors: C. Gunavathi, K. Premalatha


Microarray technology is universally used in the study of disease diagnosis using gene expression levels. The main shortcoming of gene expression data is that it includes thousands of genes and a small number of samples. Abundant methods and techniques have been proposed for tumor classification using microarray gene expression data. Feature or gene selection methods can be used to mine the genes that directly involve in the classification and to eliminate irrelevant genes. In this paper statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR) and F-Statistics are used to rank the genes. The ranked genes are used for further classification. Particle Swarm Optimization (PSO) algorithm and Shuffled Frog Leaping (SFL) algorithm are used to find the significant genes from the top-m ranked genes. The Naïve Bayes Classifier (NBC) is used to classify the samples based on the significant genes. The proposed work is applied on Lung and Ovarian datasets. The experimental results show that the proposed method achieves 100% accuracy in all the three datasets and the results are compared with previous works.

Keywords: Particle Swarm Optimization, Microarray, naive Bayes classifier, signal-to-noise ratio, T-statistics, Shuffled frog leaping, FStatistics

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1593


[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, Vol. 286, no. 5439, pp. 531 - 537, 1999.
[2] E. Domany, “Cluster analysis of gene expression data,” J Stat Phys, vol. 110, pp. 1117-1139, 2003.
[3] T. Umpai, A. Stuart, “Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes,” BMC Bioinformatics, vol. 6, no. 148, 2005.
[4] S. Vanichayobon, W. Siriphan, and W. Wiphada, “Microarray Gene Selection Using Self-Organizing Map,” in Proceedings of the seventh WSEAS International Conference on Simulation, Modelling and Optimization, Beijing, China, 2007.
[5] X. Wang and O. Gotoh, “Accurate molecular classification of cancer using simple rules,” BMC Medical Genomics, vol. 2, no. 64, 2009.
[6] E. Martinez, M.A. Mario, and T. Victor, “Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm,” Computational Biology and Chemistry, vol. 34, pp. 244 – 250, 2010.
[7] P. Chopra, J. Lee, J. Kang, and S. Lee, “Improving Cancer Classification Accuracy Using Gene Pairs,” PLoS ONE, vol. 5, no. 12, 2010.
[8] H. Liu, L. Lei, and H. Zhang, “Ensemble gene selection for cancer classification,” Pattern Recognition, vol. 43, pp. 2763 – 2772, 2010.
[9] C. Li-Yeh, Y. Cheng-San, W. Kuo-Chuan, and Y. Cheng-Hong, “Gene selection and classification using Taguchi chaotic binary particle swarm optimization,” Expert Systems with Applications, vol. 38, pp. 13367 – 13377, 2011.
[10] O. Dagliyan, F. Uney-Yuksektepe, I.H. Kavakli, and M. Turkay, “Optimization Based Tumor Classification from Microarray Gene Expression Data,” PLoS ONE, vol. 6, no. 2, 2011.
[11] X. Wang and R. Simon,” Microarray-based cancer prediction using single Genes,” BMC Bioinformatics, vol. 12, no. 391, 2011.
[12] B. Chandra and M. Gupta, “An efficient statistical feature selection for classification of gene expression data,” Journal of Biomedical Informatics, vol. 44, pp. 529 – 535, 2011.
[13] I.H. Lee, H.L. Gerald, and V. Mahesh, “A filter-based feature selection approach for identifying potential biomarkers for lung cancer,” Journal of Clinical Bioinformatics, vol. 1, no. 11, 2011.
[14] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, B.B. Tan, B.C. Deng, and C.C. Lin, “Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1633 – 1641, 2011.
[15] D. Mishra and B. Sahu, “Feature Selection for Cancer Classification: A Signal-to-noise Ratio Approach,” International Journal of Scientific & Engineering Research, vol. 2, no. 4, 2011.
[16] H. Huang, J. Li, and J. Liu, “Gene expression data classification based on improved semi-supervised local Fisher discriminant analysis,” Expert Systems with Applications, vol. 39, pp. 2314 – 2320, 2012.
[17] G.C.J. Alonso, I.Q. Moro-Sancho, A. Simon-Hurtado, and R. Varela- Arrabal, “Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods,” Expert Systems with Applications, vol. 39, pp. 7270 –7280, 2012.
[18] M. Pradipta, “Mutual Information-Based Supervised Attribute Clustering for Microarray Sample Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 1, pp. 127 - 140, 2012.
[19] A. Sharma, I. Seiya, and M. Satoru, “A Top-R Feature Selection Algorithm For Microarray Gene Expression Data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 754 – 764, 2012.
[20] K. Yendrapalli, R. Basnet, S. Mukkamala, and A.H. Sung, “Gene Selection for Tumor Classification Using Microarray Gene Expression Data,” in Proceedings of the World Congress on Engineering, vol. I, 2007.
[21] X. Momiao, W. Li, J. Zhao, J. Li, and B. Eric, “Feature (Gene) Selection in Gene Expression-Based Tumor Classification,” Journal of Molecular Genetics and Metabolism, vol. 73, pp. 239–247, 2001.
[22] R.C. Eberhart, Y. Shi, “Comparison between Genetic Algorithms and Particle Swarm Optimization, Evolutionary Programming VII,” Lecture Notes in Computer Science, Springer New York, vol. 1447, pp. 611-616, 1998.
[23] M. Eusuff, K. Lansey, “Optimization of Water Distribution Network Design Using Shuffled Frog Leaping Algorithm,” Journal of Water Resources Planning and Management vol. 129, no. 3, pp 210 – 225, 2003.
[24] R.O. Duda, P.E. Hart, “Pattern Classification and Scene Analysis,” New York: John Wiley and Sons, 1973.
[25] N.S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175-185, 1992.
[26] M.S. Mohamed, D. Safaai, and R.O. Muhammad, “Genetic Algorithms wrapper approach to select informative genes for gene expression microarray classification using support vector machines,” in InCoB'04: Proceedings of Third International Conference on Bioinformatics, Auckland, New Zealand, 2004.