Gene Selection Guided by Feature Interdependence

Hung-Ming Lai; Andreas Albrecht; Kathleen Steinhöfel

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Gene Selection Guided by Feature Interdependence

Authors: Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel

Abstract:

Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.

Keywords: Colon cancer, feature interdependence, feature subset selection, gene selection, microarray data analysis.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1087374

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2091

References:

[1] J. R. Nevins, and A. Potti, “Mining gene expression profiles: expression signatures as cancer phenotypes,” Nature Reviews Genetics, vol. 8, no. 8, pp. 601-609, Aug, 2007.
[2] S. Y. Kim, “Effects of sample size on robustness and prediction accuracy of a prognostic gene signature,” BMC Bioinformatics, vol. 10, pp. 147, May, 2009.
[3] Y. Saeys, I. Inza, and P. Larranaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507-2517, Oct, 2007.
[4] D. A. Bell, and H. Wang, “A formalism for relevance and its application in feature subset selection,” Machine Learning, vol. 41, no. 2, pp. 175-195, Nov, 2004.
[5] L. Ein-Dor, O. Zuk, and E. Domany, “Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer,” Proceedings National Academy Sciences, vol. 103, no. 15, pp. 5923-5928, Apr, 2006.
[6] S. Davies, and S. Russell, “NP-completeness of searches for smallest possible feature sets,” AAAI Symposium on Intelligent Relevance, pp. 37-39, 1994.
[7] C. Lazar, J. Taminau, S. Megancket al., “A survey on filter techniques for feature selection in gene expression microarray analysis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1106-1119, Jul-Aug, 2012.
[8] A. Albrechta, S. A. Vinterbob, and L. Ohno-Machado, “An Epicurean learning approach to gene-expression data classification,” Artificial Intelligence in Medicine, vol. 28, no. 1, pp. 75-87, May, 2003.
[9] I. A. Gheyas, and L. S. Smith, “Feature subset selection in large dimensionality domains,” Pattern Recognition, vol. 43, no. 1, pp. 5-13, Jan, 2010.
[10] I. Guyon, J. Weston, S. Barnhill et al., “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1, pp. 389-422, 2002.
[11] X. Zhou, and D. P. Tuck, “MSVM-RFE:extensions of SVM-RFE for multiclass gene selection on DNA microarray data,” Bioinformatics, vol. 23, no. 9, pp. 1106-1114, May, 2007.
[12] P. A. Mundra, and J. C. Rajapakse, “SVM-RFE with MRMR filter for gene selection,” IEEE Trans Nanobioscience, vol. 9, no. 1, pp. 31-37, Mar, 2010.
[13] C. Ding, and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” Journal of Bioinformatics and Computational Biology, vol. 3, no. 2, pp. 185-205, Apr, 2005.
[14] F. Fleuret, “Fast binary feature selection with conditional mutualinformation,” Journal of Machine Learning Research, vol. 5, pp. 1531-1555, Nov, 2004.
[15] L. Yu, and H. Liu, “Efficient feature selection via analysis of relevance and redundancy,” Journal of Machine Learning Research, vol. 5, pp. 1205-1224, Oct, 2004.
[16] T. M. Cover, and J. A. Thomas, Elements of Information Theory, 2nd ed., Hoboken, NJ: John Wiley & Sons, ch. 2, pp. 13-55, 2006.
[17] R. Kohavi, and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1-2, pp. 273-324, Dec, 1997.
[18] U. Alon, N. Barkai, D. A. Nottermanet al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings National Academy Sciences, vol. 96, no. 12, pp. 6745-6750, Jun, 1999.
[19] G. Brown, A. Pocock, M.-J. Zhao et al., “Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection,” Journal of Machine Learning Research, vol. 13, pp. 27-66, Jan, 2012.
[20] G. Ghilardi, M. L. Biondi, M. Erarioet al., “Colorectal carcinoma susceptibility and metastases are associated with matrix metalloproteinase-7 promoter polymorphisms.,” Clinical Chemistry, vol. 49, no. 11, pp. 1940-1942, Nov, 2003.
[21] B. Yang, K. Su, J. Gaoet al., “Expression and prognostic value of matrix metalloproteinase-7 in colorectal cancer,” Asian Pacific Journal of Cancer Prevention, vol. 13, no. 3, pp. 1049-1052, 2012.
[22] M. Egeblad, and Z. Werb, “New functions for the matrix metalloproteinases in cancer progression,” Nature Reviews Cancer, vol. 2, no. 3, pp. 161-174, Mar, 2002.
[23] Y. Ma, P. Zhang, F. Wang et al., “Searching for consistently reported upand down-regulated biomarkers in colorectal cancer: a systematic review of proteomic studies,” Molecular Biology Reports, vol. 39, no. 8, pp. 8483-8490, Aug, 2012.