Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30761
Oncogene Identification using Filter based Approaches between Various Cancer Types in Lung

Authors: Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Michael Seger, Mahesh Visvanathan, Gerald H. Lushington


Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.

Keywords: Data Mining, Lung cancer, Feature selection, micro arrays

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1389


[1] A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray and M.J. Thun, "Cancer Statistics", CA Cancer J Clin, vol 58, pp. 71-96, 2008.
[2] R.S. Herbst, J.V. Heymach, S.M. Lippman, "Lung cancer." , N Engl J Med., vol. 360, pp. 87-8, 2009.
[3] I.G. Campbell, S.E. Russell, D.Y. Choong, K.G. Montgomery, M.L. Ciavarella, C.S. Hooi, B.E. Cristiano, P.B. Pearson, W.A. Phillips, "Mutation of the pik3ca gene in ovarian and breast cancer", Cancer Res., vol. 64, pp. 7678-7681, 2004.
[4] R. Hewett and P. Kijsanayothin, "Tumor classification ranking from microarray data", BMC Genomics, vol. 9, 2008.
[5] C. Baumgartner and A. Graber, "Data mining and knowledge discovery in metabolomics," In Masseglia F, Poncelet P, Teisseire M (eds.) Successes and new directions in data mining. Idea Group Inc, 2007, pp. 141-166.
[6] M. Netzer, G. Millonig, M. Osl, B. Pfeifer, S. Praun, J. Villinger, W. Vogel and C. Baumgartner, "A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry (IMR-MS)", Bioinformatics, vol. 25, pp. 941-947, 2009.
[7] S. Geman, E. Bienenstock and R. Doursat, "Neural networks and the bias/variance dilemma.", Neural Computation, vol. 4, pp. 1-58, 1992.
[8] P. Putten and M. Someren, "A bias-variance analysis of a real world learning problem: the coil challenge 2000." Machine Learning, vol. 57, pp. 177-195, 2004.
[9] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.
[10] R.J. Quinlan, C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann, 1993.
[11] C. Baumgartner and D. Baumgartner, "Biomarker discovery, disease classification, and similarity query processing on high-throughput ms/ms data of inborn errors of metabolism." J Biomol Screen, vol. 11, pp. 90-99, 2006.
[12] NCI, nt.publicIdentifier=woost-00041#; last visited on April 9th, 2009.
[13] M. Osl, S. Dreiseitl, B. Pfeifer, K. Weinberger, H. Klocker, G. Bartsch, G. Schäfer, B. Tilg, A. Graber, and C. Baumgartner, "A new rule-based data mining algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry." Bioinformatics, vol. 24, pp. 2908-2914, 2008.
[14] J.D. Nelson, "Finding useful questions: on Bayesian diagnosticity, probability, impact, and information gain." Psychol Rev., pp. 979-99, 2005.
[15] J.B. MacQueen (1967): "Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, 1:281-297
[16] J.A. Hartigan and M.A. Wong, "A k-means clustering algorithm." JR Stat. Soc. Ser. C-Appl. Stat, 28:100-108, 1979.
[17] R. Barriot, J. Poix., A. Groppi, A. Barre., N. Goffard., D. Sherman., I. Dutour and A. de Daruvar, "New strategy for the representation and the integration of biomolecular knowledge at a cellular scale." Nucleic Acids Res., vol. 32, pp. 3581-3589, 2004.