Gene Selection Guided by Feature Interdependence
Authors: Hung-Ming Lai, Andreas Albrecht, Kathleen Steinhöfel
Abstract:
Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.
Keywords: Colon cancer, feature interdependence, feature subset selection, gene selection, microarray data analysis.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1087374
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2147References:
[1] J. R. Nevins, and A. Potti, “Mining gene expression profiles: expression
signatures as cancer phenotypes,” Nature Reviews Genetics, vol. 8, no. 8,
pp. 601-609, Aug, 2007.
[2] S. Y. Kim, “Effects of sample size on robustness and prediction accuracy
of a prognostic gene signature,” BMC Bioinformatics, vol. 10, pp. 147,
May, 2009.
[3] Y. Saeys, I. Inza, and P. Larranaga, “A review of feature selection
techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp.
2507-2517, Oct, 2007.
[4] D. A. Bell, and H. Wang, “A formalism for relevance and its application
in feature subset selection,” Machine Learning, vol. 41, no. 2, pp.
175-195, Nov, 2004.
[5] L. Ein-Dor, O. Zuk, and E. Domany, “Thousands of samples are needed
to generate a robust gene list for predicting outcome in cancer,”
Proceedings National Academy Sciences, vol. 103, no. 15, pp. 5923-5928,
Apr, 2006.
[6] S. Davies, and S. Russell, “NP-completeness of searches for smallest
possible feature sets,” AAAI Symposium on Intelligent Relevance, pp.
37-39, 1994.
[7] C. Lazar, J. Taminau, S. Megancket al., “A survey on filter techniques for
feature selection in gene expression microarray analysis,” IEEE/ACM
Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4,
pp. 1106-1119, Jul-Aug, 2012.
[8] A. Albrechta, S. A. Vinterbob, and L. Ohno-Machado, “An Epicurean
learning approach to gene-expression data classification,” Artificial
Intelligence in Medicine, vol. 28, no. 1, pp. 75-87, May, 2003.
[9] I. A. Gheyas, and L. S. Smith, “Feature subset selection in large
dimensionality domains,” Pattern Recognition, vol. 43, no. 1, pp. 5-13,
Jan, 2010.
[10] I. Guyon, J. Weston, S. Barnhill et al., “Gene selection for cancer
classification using support vector machines,” Machine Learning, vol. 46,
no. 1, pp. 389-422, 2002.
[11] X. Zhou, and D. P. Tuck, “MSVM-RFE:extensions of SVM-RFE for
multiclass gene selection on DNA microarray data,” Bioinformatics, vol.
23, no. 9, pp. 1106-1114, May, 2007.
[12] P. A. Mundra, and J. C. Rajapakse, “SVM-RFE with MRMR filter for
gene selection,” IEEE Trans Nanobioscience, vol. 9, no. 1, pp. 31-37, Mar,
2010.
[13] C. Ding, and H. Peng, “Minimum redundancy feature selection from
microarray gene expression data,” Journal of Bioinformatics and
Computational Biology, vol. 3, no. 2, pp. 185-205, Apr, 2005.
[14] F. Fleuret, “Fast binary feature selection with conditional
mutualinformation,” Journal of Machine Learning Research, vol. 5, pp.
1531-1555, Nov, 2004.
[15] L. Yu, and H. Liu, “Efficient feature selection via analysis of relevance
and redundancy,” Journal of Machine Learning Research, vol. 5, pp.
1205-1224, Oct, 2004.
[16] T. M. Cover, and J. A. Thomas, Elements of Information Theory, 2nd ed.,
Hoboken, NJ: John Wiley & Sons, ch. 2, pp. 13-55, 2006.
[17] R. Kohavi, and G. H. John, “Wrappers for feature subset selection,”
Artificial Intelligence, vol. 97, no. 1-2, pp. 273-324, Dec, 1997.
[18] U. Alon, N. Barkai, D. A. Nottermanet al., “Broad patterns of gene
expression revealed by clustering analysis of tumor and normal colon
tissues probed by oligonucleotide arrays,” Proceedings National
Academy Sciences, vol. 96, no. 12, pp. 6745-6750, Jun, 1999.
[19] G. Brown, A. Pocock, M.-J. Zhao et al., “Conditional Likelihood
Maximisation: A Unifying Framework for Information Theoretic Feature
Selection,” Journal of Machine Learning Research, vol. 13, pp. 27-66,
Jan, 2012.
[20] G. Ghilardi, M. L. Biondi, M. Erarioet al., “Colorectal carcinoma
susceptibility and metastases are associated with matrix
metalloproteinase-7 promoter polymorphisms.,” Clinical Chemistry, vol.
49, no. 11, pp. 1940-1942, Nov, 2003.
[21] B. Yang, K. Su, J. Gaoet al., “Expression and prognostic value of matrix
metalloproteinase-7 in colorectal cancer,” Asian Pacific Journal of
Cancer Prevention, vol. 13, no. 3, pp. 1049-1052, 2012.
[22] M. Egeblad, and Z. Werb, “New functions for the matrix
metalloproteinases in cancer progression,” Nature Reviews Cancer, vol. 2,
no. 3, pp. 161-174, Mar, 2002.
[23] Y. Ma, P. Zhang, F. Wang et al., “Searching for consistently reported upand
down-regulated biomarkers in colorectal cancer: a systematic review
of proteomic studies,” Molecular Biology Reports, vol. 39, no. 8, pp.
8483-8490, Aug, 2012.