Using Statistical Significance and Prediction to Test Long/Short Term Public Services and Patients Cohorts: A Case Study in Scotland
Authors: Sotirios Raptis
Abstract:
Health and Social care (HSc) services planning and scheduling are facing unprecedented challenges, due to the pandemic pressure and also suffer from unplanned spending that is negatively impacted by the global financial crisis. Data-driven approaches can help to improve policies, plan and design services provision schedules using algorithms that assist healthcare managers to face unexpected demands using fewer resources. The paper discusses services packing using statistical significance tests and machine learning (ML) to evaluate demands similarity and coupling. This is achieved by predicting the range of the demand (class) using ML methods such as Classification and Regression Trees (CART), Random Forests (RF), and Logistic Regression (LGR). The significance tests Chi-Squared and Student’s test are used on data over a 39 years span for which data exist for services delivered in Scotland. The demands are associated using probabilities and are parts of statistical hypotheses. These hypotheses, as their NULL part, assume that the target demand is statistically dependent on other services’ demands. This linking is checked using the data. In addition, ML methods are used to linearly predict the above target demands from the statistically found associations and extend the linear dependence of the target’s demand to independent demands forming, thus, groups of services. Statistical tests confirmed ML coupling and made the prediction statistically meaningful and proved that a target service can be matched reliably to other services while ML showed that such marked relationships can also be linear ones. Zero padding was used for missing years records and illustrated better such relationships both for limited years and for the entire span offering long-term data visualizations while limited years periods explained how well patients numbers can be related in short periods of time or that they can change over time as opposed to behaviours across more years. The prediction performance of the associations were measured using metrics such as Receiver Operating Characteristic (ROC), Area Under Curve (AUC) and Accuracy (ACC) as well as the statistical tests Chi-Squared and Student. Co-plots and comparison tables for the RF, CART, and LGR methods as well as the p-value from tests and Information Exchange (IE/MIE) measures are provided showing the relative performance of ML methods and of the statistical tests as well as the behaviour using different learning ratios. The impact of k-neighbours classification (k-NN), Cross-Correlation (CC) and C-Means (CM) first groupings was also studied over limited years and for the entire span. It was found that CART was generally behind RF and LGR but in some interesting cases, LGR reached an AUC = 0 falling below CART, while the ACC was as high as 0.912 showing that ML methods can be confused by zero-padding or by data’s irregularities or by the outliers. On average, 3 linear predictors were sufficient, LGR was found competing well RF and CART followed with the same performance at higher learning ratios. Services were packed only when a significance level (p-value) of their association coefficient was more than 0.05. Social factors relationships were observed between home care services and treatment of old people, low birth weights, alcoholism, drug abuse, and emergency admissions. The work found that different HSc services can be well packed as plans of limited duration, across various services sectors, learning configurations, as confirmed by using statistical hypotheses.
Keywords: Class, cohorts, data frames, grouping, prediction, probabilities, services.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 470References:
[1] Scottish Government, Statistics Service Health and Social Care Data:Growing up in Scotland: health inequalities in the early years. statistics.gov.scot. https://www.gov.scot/publications/growing-upscotland- health-inequalities-early-years/pages/5/
[2] Nnoaham K E, Cann K F Can cluster analyses of linked healthcare data identify unique population segments in a general practice-registered population?. BMC Public Health. 20, 798 (2020). https://doi.org/10.1186/s12889-020-08930-z
[3] Benjamin Seligman, ShripadTuljapurkar, DavidRehkopf Machine learning approaches to the social determinants of health in the health and retirement study . SSM - Population Health. volume 4, April 2018. Pages 95-99. https://doi.org/10.1016/j.ssmph.2017.11.008
[4] Ian Litchfield Can process mining automatically describe care pathways of patients with long- term conditions in UK primary care? A study protocol. BMJ Open. 2018. https://bmjopen.bmj.com/content/8/12/e019947
[5] Bose, Johnson, Alistair, Moskowitz, Ari Celi, Leo (2016). Raffa, Jesse. (2018). Impact of Intensive Care Unit Discharge Delays on Patient Outcomes: A Retrospective Cohort Study. Journal of Intensive Care Medicine. 34. 088506661880027. 10.1177/0885066618800276
[6] Rahmandad, H.,Oliva R.,&Osgood, N. D. (Eds). (2015a). Chapter 1: Parameter estimation through maximum likelihood and bootstrapping methods. In: Analytical methods for dynamic modelers (pp. 3–38) . MIT Press.
[7] Liying Fang, Han Zhao et al. Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data. Biomedical Signal Processing and ControL. 21 (2015) 82–89. https://core.ac.uk/download/pdf/82644081.pdf
[8] van der Hoef H, Warrens M J Understanding information theoretic measures for comparing clusterings.. Behaviormetrika. 46 353–370 (2019). https://doi.org/10.1007/s41237-018-0075-7
[9] Claudio Heinrich. On the number of bins in a rank histogram(2020). 2, https://arxiv.org/pdf/2005.09018.pdf
[10] Myers, P.D., Ng, K., Severson, K. et al.. Identifying unreliable predictions in clinical risk models. npj Digit. Med. 3, 8 (2020). https://doi.org/10.1038/s41746-019-0209-7.
[11] Lokhandwala S., Rush B. (2016) Objectives of the Secondary Analysis of Electronic Health Record Data. In: Secondary Analysis of Electronic Health Records. Springer, Cham. https : //doi.org/10.1007/978 − 3 − 319 − 43742 − 2 1
[12] Xu S., Chan H K , Ch’ng, E. et al. A comparison of forecasting methods for medical device demand using trend-based clustering scheme (2020). J. of Data, Inf. and Manag. 2, 85–94 (2020). https://doi.org/10.1007/s42488-020-00026-y
[13] Health 2020: Social protection,housing and health - September 2016. World Health Organization (WHO), https : //www.euro.who.int/ data/assets/pdff ile/0005/324635/Health− 2020 − Social − protection,−housing − and − health − en.pdf
[14] Spurious-correlations examples . http://tylervigen.com/spurious-correlations
[15] Christopher R. Knittel, Bora Ozaltun. What does and does not correlate with COVID-19 death rates. medRxiv 2020.06.09.20126805 . https://doi.org/10.1101/2020.06.09.20126805
[16] Scottish Government, Statistics Service Health and Social Care Data:Growing up in Scotland: health inequalities in the early years. statistics.gov.scot. https://www.gov.scot/publications/growing-up-scotlandhealth- inequalities-early-years/pages/5/
[17] K C, Anil, Basel, P. L., & Singh, S. (2020). Low birth weight and its associated risk factors: Health facility-based case-control study. PloS one, 15(6), e0234907. https://doi.org/10.1371/journal.pone.0234907
[18] ARI BRONSOLER, JOSEPH DOYLE, JOHN VAN REENEN. The Impact of New Technology on the Healthcare Workforce. MIT work of the future, Research Briefs - October 2020. https://workofthefuture.mit.edu/research-post/the-impact-of-newtechnology- on-the-healthcare-workforce/
[19] Md. Zahangir Alam A Random Forest based predictor for medical data classification using feature ranking. Informatics in Medicine Unlocked(2019). volume 15, 2019, 100180. https://doi.org/10.1016/j.imu.2019.100180
[20] Breiman, L. (2017). Classification and regression trees. Routledge.
[21] Xiao, C., Choi, E., & Sun, J. (2018). Opportunities and challenges in developing deep learning models using electronic health records data:A systematic review. Journal of the American Medical Informatics Association, 25(10), 1419–1428.
[22] Chen, P. C., Liu, Y., & Peng, L. (2019). How to develop machine learning models for healthcare. Nature Materials, 18(5), 410.
[23] Dwyer DB, Kalman JL, Budde M, et al.. An Investigation of Psychosis Subgroups With Prognostic Validation and Exploration of Genetic Underpinnings: The PsyCourse Study . JAMA Psychiatry. 2020;77(5):523–533. doi:10.1001/jamapsychiatry.2019.4910
[24] Hsu DJ, Feng M, Kothari R, Zhou H, Chen KP, Celi LA. The Association Between Indwelling Arterial Catheters and Mortality in Hemodynamically Stable Patients With Respiratory Failure: A Propensity Score Analysis. Chest. 2015 Dec;148(6):1470-1476. doi: 10.1378/chest.15-0516. PMID: 26270005; PMCID: PMC4665738.
[25] Irvin, J.A., Kondrich, A.A., Ko, M. et al. Incorporating machine learning and social determinants of health indicators into prospective risk adjustment for health plan payments. BMC Public Health 20, 608 (2020). https://doi.org/10.1186/s12889-020-08735-0