Oncogene Identification using Filter based Approaches between Various Cancer Types in Lung
Authors: Michael Netzer, Michael Seger, Mahesh Visvanathan, Bernhard Pfeifer, Gerald H. Lushington, Christian Baumgartner
Abstract:
Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.
Keywords: lung cancer, micro arrays, data mining, feature selection.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1055373
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1753References:
[1] A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray and M.J. Thun, "Cancer Statistics", CA Cancer J Clin, vol 58, pp. 71-96, 2008.
[2] R.S. Herbst, J.V. Heymach, S.M. Lippman, "Lung cancer." , N Engl J Med., vol. 360, pp. 87-8, 2009.
[3] I.G. Campbell, S.E. Russell, D.Y. Choong, K.G. Montgomery, M.L. Ciavarella, C.S. Hooi, B.E. Cristiano, P.B. Pearson, W.A. Phillips, "Mutation of the pik3ca gene in ovarian and breast cancer", Cancer Res., vol. 64, pp. 7678-7681, 2004.
[4] R. Hewett and P. Kijsanayothin, "Tumor classification ranking from microarray data", BMC Genomics, vol. 9, 2008.
[5] C. Baumgartner and A. Graber, "Data mining and knowledge discovery in metabolomics," In Masseglia F, Poncelet P, Teisseire M (eds.) Successes and new directions in data mining. Idea Group Inc, 2007, pp. 141-166.
[6] M. Netzer, G. Millonig, M. Osl, B. Pfeifer, S. Praun, J. Villinger, W. Vogel and C. Baumgartner, "A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry (IMR-MS)", Bioinformatics, vol. 25, pp. 941-947, 2009.
[7] S. Geman, E. Bienenstock and R. Doursat, "Neural networks and the bias/variance dilemma.", Neural Computation, vol. 4, pp. 1-58, 1992.
[8] P. Putten and M. Someren, "A bias-variance analysis of a real world learning problem: the coil challenge 2000." Machine Learning, vol. 57, pp. 177-195, 2004.
[9] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.
[10] R.J. Quinlan, C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann, 1993.
[11] C. Baumgartner and D. Baumgartner, "Biomarker discovery, disease classification, and similarity query processing on high-throughput ms/ms data of inborn errors of metabolism." J Biomol Screen, vol. 11, pp. 90-99, 2006.
[12] NCI, https://array.nci.nih.gov/caarray/project/details.action?project.experime nt.publicIdentifier=woost-00041#; last visited on April 9th, 2009.
[13] M. Osl, S. Dreiseitl, B. Pfeifer, K. Weinberger, H. Klocker, G. Bartsch, G. Schäfer, B. Tilg, A. Graber, and C. Baumgartner, "A new rule-based data mining algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry." Bioinformatics, vol. 24, pp. 2908-2914, 2008.
[14] J.D. Nelson, "Finding useful questions: on Bayesian diagnosticity, probability, impact, and information gain." Psychol Rev., pp. 979-99, 2005.
[15] J.B. MacQueen (1967): "Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, 1:281-297
[16] J.A. Hartigan and M.A. Wong, "A k-means clustering algorithm." JR Stat. Soc. Ser. C-Appl. Stat, 28:100-108, 1979.
[17] R. Barriot, J. Poix., A. Groppi, A. Barre., N. Goffard., D. Sherman., I. Dutour and A. de Daruvar, "New strategy for the representation and the integration of biomolecular knowledge at a cellular scale." Nucleic Acids Res., vol. 32, pp. 3581-3589, 2004.