Imputation Technique for Feature Selection in Microarray Data Set
Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam
Abstract:
Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.
Keywords: DNA microarray, feature selection, missing data, bioinformatics.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1099464
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2814References:
[1] Ash A Alizadeh, Michael B Eisen, R Eric Davis, Chi Ma, Izidore S Lossos, Andreas Rosenwald, Jennifer C Boldrick, Hajeer Sabet, Truc Tran, Xin Yu, et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769):503–511, 2000.
[2] V Bol´on-Canedo, N S´anchez-Maro˜no, A Alonso-Betanzos, JM Ben´ıtez, and F Herrera. A review of microarray datasets and applied feature selection methods. Information Sciences, 282:111–135, 2014.
[3] L´ıgia P Br´as and Jos´e C Menezes. Improving cluster-based missing value estimation of dna microarray data. Biomolecular engineering, 24(2):273–282, 2007.
[4] Magalie Celton, Alain Malpertuy, Ga¨elle Lelandais, and Alexandre G De Brevern. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC genomics, 11(1):15, 2010.
[5] Kyriacos Chrysostomou, M Lee, SY Chen, and X Liu. Wrapper feature selection., 2009.
[6] Alexandre G De Brevern, Serge Hazout, and Alain Malpertuy. Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC bioinformatics, 5(1):114, 2004.
[7] Chris Ding and Hanchuan Peng. Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology, 3(02):185–205, 2005.
[8] Eibe Frank, Mark Hall, Len Trigg, Geoffrey Holmes, and Ian H Witten. Data mining in bioinformatics using weka. Bioinformatics, 20(15):2479–2481, 2004.
[9] Rebecka J¨ornsten, Hui-Yu Wang, William J Welsh, and Ming Ouyang. Dna microarray data imputation and significance analysis of differential expression. Bioinformatics, 21(22):4155–4161, 2005.
[10] Hyunsoo Kim, Gene H Golub, and Haesun Park. Missing value estimation for dna microarray gene expression data: local least squares imputation. Bioinformatics, 21(2):187–198, 2005.
[11] Ki-Yeol Kim, Byoung-Jin Kim, and Gwan-Su Yi. Reuse of imputed data in microarray analysis increases imputation efficiency. BMC bioinformatics, 5(1):160, 2004.
[12] Alan Wee-Chung Liew, Ngai-Fong Law, and Hong Yan. Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Briefings in bioinformatics, 12(5):498–513, 2011.
[13] Rosa J Meijer and Jelle J Goeman. Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical Journal, 55(2):141–155, 2013.
[14] Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to linear regression analysis, volume 821. John Wiley & Sons, 2012.
[15] Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, and Shin Ishii. A bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16):2088–2096, 2003.
[16] Yvan Saeys, I˜naki Inza, and Pedro Larra˜naga. A review of feature selection techniques in bioinformatics. bioinformatics, 23(19):2507–2517, 2007.
[17] Henning Schmidt and Mats Jirstrand. Systems biology toolbox for matlab: a computational platform for research in systems biology. Bioinformatics, 22(4):514–515, 2006.
[18] Muhammad Shoaib B Sehgal, Iqbal Gondal, and Laurence Dooley. Statistical neural networks and support vector machine for the classification of genetic mutations in ovarian cancer. In Computational Intelligence in Bioinformatics and Computational Biology, 2004. CIBCB’04. Proceedings of the 2004 IEEE Symposium on, pages 140–146. IEEE, 2004.
[19] Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B Altman. Missing value estimation methods for dna microarrays. Bioinformatics, 17(6):520–525, 2001.
[20] Teruyuki Ueda, Masao Honda, Katsuhisa Horimoto, Sachiyo Aburatani, Shigeru Saito, Taro Yamashita, Yoshio Sakai, Mikiko Nakamura, Hajime Takatori, Hajime Sunagozaka, et al. Gene expression profiling of hepatitis b-and hepatitis c-related hepatocellular carcinoma using graphical gaussian modeling. Genomics, 101(4):238–248, 2013.
[21] Xiaobai Zhang, Xiaofeng Song, Huinan Wang, and Huanping Zhang. Sequential local least squares imputation estimating missing value of microarray data. Computers in biology and medicine, 38(10):1112–1120, 2008.