Feature Selection Approaches with Missing Values Handling for Data Mining - A Case Study of Heart Failure Dataset

N.Poolsawad; C.Kambhampati; J. G. F. Cleland

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Feature Selection Approaches with Missing Values Handling for Data Mining - A Case Study of Heart Failure Dataset

Authors: N.Poolsawad, C.Kambhampati, J. G. F. Cleland

Abstract:

In this paper, we investigated the characteristic of a clinical dataseton the feature selection and classification measurements which deal with missing values problem.And also posed the appropriated techniques to achieve the aim of the activity; in this research aims to find features that have high effect to mortality and mortality time frame. We quantify the complexity of a clinical dataset. According to the complexity of the dataset, we proposed the data mining processto cope their complexity; missing values, high dimensionality, and the prediction problem by using the methods of missing value replacement, feature selection, and classification.The experimental results will extend to develop the prediction model for cardiology.

Keywords: feature selection, missing values, classification, clinical dataset, heart failure.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1085912

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3160

References:

[1] A. K. Tanwani, M. J. Afridi, M. Z. Shafiq, M. Farooq: Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets. EvoBIO 2009: 128-139
[2] N. Zhou, L. Wang, "A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data," Genomics, Proteomics & Bioinformatics, 5(3-4), pp. 242-249, 2007.
[3] U. Fayyad, K. Irani, "Multi-interval discretization of continuous-valued attributes for classication learning,"In: 13th International Joint Conference on Artificial Intelligencepp. 1022-1029, 1993.
[4] H.Liu, J.Li , L. Wong, "A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns," Genome Informatics, 13, 2002, pp. 51-60.
[5] C.-N.Hsu, H.-J.Huang, D. Schuschel, "The ANNIGMA-wrapper approach to fast feature selection for Neural Nets," IEEE Transactions Systems, Man and Cybernetics, Part B, 2002, pp. 1-6.
[6] Heart Failure Society of, A. (2010). "Section 2: Conceptualization and Working Definition of Heart Failure." Journal of cardiac failure 16(6): e34-e37.
[7] W. B. Kannel, R. B. D'Agostino, H. Silbershatz, et al. "Profile for estimating risk of heart failure," Arch Intern Med 1999;159:1197-204.
[8] E. Acuna, C. Rodriguez, "The treatment of missing values and its effect in the classifier accuracy," In: Banks, D., House, L., McMorris, F.R., Arabie, P., Gaul, W. (Eds.), Classification, Clustering and Data Mining Applications, Springer, Berlin, Heidelberg. pp. 639-648.
[9] J.-H. Lin, P. J. Haug, "Data Preparation Framework for Preprocessing Clinical Data in Data Mining," AMIA Annual Symposium proceedings AMIA Symposium AMIA Symposium, 2006, 489-493.
[10] L.Yu, H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," Machine Learning Research, 5, pp. 1205-1224, 2004.
[11] T.Jirapech-Umpai,S. Aitken, "Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes," BMC Bioinformatics, 6, 148, 2005.
[12] R. J.Harris, "A Primer of Multivariate Statistics, 3rd ed., New Jersey : Lawrence Erlbaum Associates, 2001.
[13] S.Li,C.Liao,J. T.Kwok, "Gene Feature Extraction Using t-Test Statistics and Kernel Partial Least Squares," ICONIP, 3, pp. 11-20, 2006.
[14] L.Wang, F.Chu, W.Xie, "Accurate Cancer Classification Using Expressions of Very Few Genes," IEEE/ACM Transactions on Computational Biology and Bioinformatics, pp. 40-53, 2007.
[15] D. W. Aha, R. L.Bankert, "A Comparative Evaluation of Sequential Feature Selection Algorithms," In: Fifth International Workshop onArtificial Intelligence and Statistics, pp. 1-7, 1995.
[16] Analysis Factor, "EM Imputation and Missing Data: Is Mean Imputation Really so Terrible?,"
[Online], 15 April 2009, (URL http://www.analysisfactor.com/statchat/tag/spss-missing-valuesanalysis/)( Accessed 30August 2011).
[17] E.-L. Silva-Ram├¡rez, R. Pino-Mej├¡as, M. L├│pez-Coello, M.-D. Cubilesde- la-Vega, "Missing value imputation on missing completely at random data using multilayer perceptrons," Neural Networks, 24,1, 121-129, 2011.
[18] The University of Waikato, "WEKA: The Waikato Environment for Knowledge Acquisition,"
[Online],(URL http://www.cs.waikato.ac.nz/ml/weka/)(Accessed 30August 2011).
[19] F. Coetzee, "Correcting the Kullback-Leibler distance for feature selection", presented at Pattern Recognition Letters, 2005, pp.1675- 1683.
[20] A.-N. Yahya, M. G. Kevin, Z. Jufen, G.F. C. John, L. C. Andrew, "Red cell distribution width: an inexpensive and powerful prognostic marker in heart failure,"European Journal Heart Failure,vol. 11,pp. 1155-1162, 2009.
[21] Atherotech Diagnotics Lab, "Atherotech Panels,"
[Online], (URL http://www.atherotech.com/athdiagtests/atherotechpanels.asp), (Accessed 13 June 2011).