Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second
Authors: P. V. Pramila, V. Mahesh
Abstract:
Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients resulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF25, PEF, FEF25-75, FEF50 and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects) with the aforementioned input features. It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, as well as yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.
Keywords: FEV1, Multivariate Adaptive Regression Splines Pulmonary Function Test, Random Forest.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1100150
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3741References:
[1] Daniel C Ginnan and Jonathon Dean Truwit, “Clinical review: Respiratory mechanics in spontaneous and assisted ventilation,” Critical Care, vol. 9, no.5, pp. 472–484, 2005.
[2] R. L. Mulder, N. M. Thonissen, J. H. H. Vander Pal, P. Bresser, W. Hanselaar, C. C. E. Koning, F. Oldenburger, H. A. Heij, H. N. Caron, “Pulmonary function impairment measured by pulmonary function tests in lon g-term survivors of childhood cancer,” Thorax, vol. 66, pp. 1065- 1071, 2011.
[3] A. Mythili, C. M. Sujatha , S. Srinivasan and S. Ramakrishnan, “Prediction Of Forced Expiratory Volume In Spirometric Pulmonary Function Test Using Adaptive Neuro Fuzzy Inference System,” Biomedical Sciences Instrumentation, vol. 48, pp.508-15, 2012.
[4] D. Ozerkis-Antin, J. Evans, A. Rubinowitz, R.J. Horner, R.A. Matthay, “Pulmonary manifestations of rheumatoid arthritis,” Clinical Chest Medicine, vol.31, no.3, pp. 451-78, 2010.
[5] Thomas A Barnes, Len Fromer, “Spirometry use: detection of chronic obstructive pulmonary disease in the primary care setting,” International Journal of CODP, 2011.
[6] R.E. Dales, K.L. Vandemheen, J. Clinch, et al. “Spirometry in the primary care setting: influence on clinical diagnosis and management of airflow obstruction,” Journal of Chest, vol.128, no. 4, pp. 2443–2447, 2005.
[7] N. Chavannes, T. Schermer, R. Akkermans, et al. “Impact of spirometry on GPs’ diagnostic differentiation and decision-making,” Respiratory Medicine, vol.98, no.11, pp.1124–1130, 2004.
[8] R.P.Young, R. Hopkins, T.E. Eaton, “Forced expiratory volume in one second: not just a lung function test but a marker of premature death from all causes,” European Respiratory Journal, vol. 30, no.4, pp.616– 622, 2007.
[9] “Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease,” American Thoracic Society, American Journal of Respiratory and Critical Care Med, vol.152, pp.77-121, 1995.
[10] D.C. Richter , J.R. Joubert , H. Nell, M.M. Schuurmans, E.M. Irusen, “Diagnostic value of post-bronchodilator pulmonary function testing to distinguish between stable, moderate to severe COPD and asthma, International journal of chronic obstructive pulmonary disorder, vol. 3, no.4, pp. 693-699, 2008.
[11] Jeffrey M. Haynes, “Pulmonary Function Test Quality in the Elderly: A Comparison with Younger Adults,” Respiratory care, vol.59, no.1, jan 2014.
[12] American Thoracic Society, Standardization of spirometry: a summary of recommendations from the American Thoracic Society. 1987 update, Ann Intern Med, vol.108, pp. 217–220, 1988.
[13] V. Bellia, R. Pistelli, F. Catalano, R. Antonelli-Incalzi, V. Grassi, G. Meillo, et al . “Quality control of spirometry in the elderly: the SARA study,” Am J Respir Crit Care Med, vol. 161, no.4, pp. 1094-1100, 2000.
[14] L. Pezzoli, G. Giardini, S. Consonni , I. Dallera, C. Bilotta, G. Ferrario G et al. “Quality of sprirometric performance in older people. Age Ageing,” vol. 32, no. 1, pp. 43-46, 2003.
[15] Xu, Ruo, “Improvements to random forest methodology,” Graduate Thesis and Dissertations, Paper 13052, 2013.
[16] Mark R. Segal, “Machine Learning Benchmarks and Random Forest Regression,” Kluwer Academic Publishers, 2003.
[17] L. Breiman, “Random forests,” Machine Learning, vol. 45, 2001, pp. 5– 32.
[18] Anne-Laure Boulesteix, Silke Janitza, Jochen Kruppa, Inke R. Konig, “Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics,” available at: http://epub.ub.uni-muenchen.de/13766/1/TR.pdf
[19] M. Hilario, A. Kalousis, C. Pellegrini, M. Muller, “Processing and classification of protein mass spectra, Mass Spectrom Rev, vol.25, pp. 409-449.
[20] Akin Özçift, “Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis,” Computers in Biology and Medicine, vol.41, no.5, pp.265-271, 2011.
[21] Benjamin A Goldstein, Alan E Hubbard, Adele Cutler and Lisa F Barcellos, “An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings,” BMC genetics, vol.11, no.49, 2010.
[22] Nahit Emanet, Halil R Öz, Nazan Bayram and Dursun Delen, “A comparative analysis of machine learning methods for classification type decision problems in healthcare,” Decision Analytics, vol.1, no.6, pp.1- 20, 2014.
[23] K. J. Archer and R.V. Kimes, “Empirical characterization of random forest variable importance measures,” Computational Statistics and Data Analysis, vol. 52, pp. 2249-2260.
[24] J.H. Friedman, Multivariate adaptive regression splines, Ann. Stat., vol. 19, pp. 1–141, 1991.
[25] Dani Guzmán, Francisco Javier de Cos Juez, Fernando Sánchez Lasheras, Richard loop adaptiev Myers and Laura Young, “Deformable mirror model for open- optics using multivariate adaptive regression splines,” Optics Express, vol.18, no.7, pp. 6492 – 6505, 2013.
[26] P. A. W. Lewis and J. G. Stevens, “Nonlinear modeling of time series using multivariate adaptive regression splines (mars),” Journal of the American Statistical Association, vol. 86, no. 416, pp. 864-877, 1991.
[27] Peter C. Austin, “A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality,” Statistics in Medicine, vol. 26, pp. 2937– 2957,