Reducing SAGE Data Using Genetic Algorithms

Cheng-Hong Yang; Tsung-Mu Shih; Li-Yeh Chuang

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33132

Reducing SAGE Data Using Genetic Algorithms

Authors: Cheng-Hong Yang, Tsung-Mu Shih, Li-Yeh Chuang

Abstract:

Serial Analysis of Gene Expression is a powerful quantification technique for generating cell or tissue gene expression data. The profile of the gene expression of cell or tissue in several different states is difficult for biologists to analyze because of the large number of genes typically involved. However, feature selection in machine learning can successfully reduce this problem. The method allows reducing the features (genes) in specific SAGE data, and determines only relevant genes. In this study, we used a genetic algorithm to implement feature selection, and evaluate the classification accuracy of the selected features with the K-nearest neighbor method. In order to validate the proposed method, we used two SAGE data sets for testing. The results of this study conclusively prove that the number of features of the original SAGE data set can be significantly reduced and higher classification accuracy can be achieved.

Keywords: Serial Analysis of Gene Expression, Feature selection, Genetic Algorithm, K-nearest neighbor method.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1075304

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615

References:

[1] V.E. Velculescu, L. Zhang, B. Vogelstein and K.W. Kinzler, "Serial analysis of gene expression", Science, vol. 270, no. 5235, pp. 484-487, October 1995.
[2] L. Zhang, W. Zhou, V.E. Velculescu, S.E. Kern, R.H. Hruban, S.R. Hamilton, B. Vogelstein and K.W. Kinzler, "Gene Expression Profiles in Normal and Cancer Cells", Science, vol. 276, no. 5316, pp. 1268-1272, May 1997.
[3] T.C. He, A.B. Sparks, C. Rago, H. Hermeking, L. Zawel, L. T. da Costa, P.J. Morin, B. Vogelstein and K.W. Kinzler, "Identification of Myc as a target of the APC pathway", Science, vol. 281, no. 5382, pp. 1509-1512, September 1998.
[4] V.E. Velculescu, S.L. Madden, L. Zhang, A.E. Lash, J. Yu, C. Rago, A. Lal, C.J. Wang, G.A. Beaudry, K.M. Ciriello, B.P. Cook, M.R. Dufault, A.T. Ferguson, Y. Gao, T.C. He, H. Hermeking, S.K. Hiraldo, P.M. Hwang, M.A. Lopez, H.F. Luderer, B. Mathews, J.M. Petroziello, K. Polyak, L. Zawel, W. Zhang, X. Zhang, W. Zhou, F.G. Haluska, J. Jen, S. Sukumar, G.M. Landes, G.J. Riggins, B. Vogelstein and K.W. Kinzler, "Analysis of human transcriptomes", Nature Genetics, vol. 23, no. 4, pp. 387-388, December 1999.
[5] T. Barrett, D.B. Troup, S.E. Wilhite, P. Ledoux, D. Rudnev, C. Evangelista, I.F. Kim, A. Soboleva, M. Tomashevsky and R. Edgar, "NCBI GEO: mining tens of millions of expression profiles--database and tools update", Nucleic acids research, vol. 35, pp. 760-765, January 2007.
[6] GEO (Gene Expression Omnibus), "GSM14731", http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM14731.
[7] G. Tzanis and I. Vlahavas, "Accurate Classification of SAGE Data Based on Frequent Patterns of Gene Expression", 19th IEEE International Conference on Tools with Artificial Intelligence, vol. 1, pp. 96-100, October 2007.
[8] G. Gamberoni and S. Storari, "Supervised and unsupervised learning techniques for profiling SAGE results", In Proceedings of the ECML/PKDD Discovery Challenge Workshop, pp. 121-126, September 2004.
[9] H.T. Lin and L. Li, "Analysis of SAGE Results with Combined Learning Techniques", In Proceedings of the ECML/PKDD Discovery Challenge Workshop, pp. 102-113, October 2005.
[10] A. Alves, N. Zagoruiko, O. Okun, O. Kutnenko, and I. Borisova, "Predictive Analysis of Gene Expression Data from Human SAGE Libraries", In Proceedings of the ECML/PKDD Discovery Challenge Workshop, pp. 60-71, October 2005.
[11] Y.F. Shi and Y.P. Zhao, "Comparison of Text Categorization Algorithms", Wuhan University Journal of Natural Sciences, vol. 9, no. 5, pp. 798-804, October 2004.
[12] L.Y. Chuang, C.H. Ke and C.H. Yang, "A Hybrid Both Filter and Wrapper Feature Selection Method for Microarray Classification", International MultiConference of Engineers and Computer Scientists 2008, vol. 1, pp. 146-150, March 2008.
[13] J.R. Quinlan, C4.5: programs for machine learning, Morgan Kaufmann, San Francisco, CA, USA, 1993.
[14] Wikipedia, "Feature Selection", http://en.wikipedia.org/wiki/Feature_selection.
[15] E. Elbeltagi, T. Hegazy and D. Grierson, "Comparison among five evolutionary-based optimization algorithms", Advanced Engineering Informatics, vol. 19, Issue 1, pp. 43-53, January 2005.
[16] Wikipedia, "k-nearest neighbor algorithm", http://en.wikipedia.org/wiki/K-nearest_neighbor.
[17] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, S. Levy, "A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis", Bioinformatics, vol. 21, no. 5, pp. 631-643, March 2005.