A Machine Learning-based Analysis of Autism Prevalence Rates across US States against Multiple Potential Explanatory Variables
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32845
A Machine Learning-based Analysis of Autism Prevalence Rates across US States against Multiple Potential Explanatory Variables

Authors: Ronit Chakraborty, Sugata Banerji


There has been a marked increase in the reported prevalence of Autism Spectrum Disorder (ASD) among children in the US over the past two decades. This research has analyzed the growth in state-level ASD prevalence against 45 different potentially explanatory factors including socio-economic, demographic, healthcare, public policy and political factors. The goal was to understand if these factors have adequate predictive power in modeling the differential growth in ASD prevalence across various states, and, if they do, which factors are the most influential. The key findings of this study include (1) there is a confirmation that the chosen feature set has considerable power in predicting the growth in ASD prevalence, (2) the most influential predictive factors are identified, (3) given the nature of the most influential predictive variables, an indication that a considerable portion of the reported ASD prevalence differentials across states could be attributable to over and under diagnosis, and (4) Florida is identified as a key outlier state pointing to a potential under-diagnosis of ASD.

Keywords: Autism Spectrum Disorder, ASD, clustering, Machine Learning, predictive modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 544


[1] “Autism and Developmental Disabilities Monitoring (ADDM) - Centers for Disease Control and Prevention,” https://www.cdc.gov/, accessed on Fri, November 11, 2022.
[Online]. Available: https://www.cdc.gov/ncbddd/autism/data/index.html
[2] C. Schweikert, Y. Li, D. Dayya, D. Yens, M. Torrents, and D. Hsu, “Analysis of autism prevalence and neurotoxins using combinatorial fusion and association rule mining,” 06 2009, pp. 400–404.
[3] R. Nataf, C. Skorupka, L. Amet, A. Lam, A. Springbett, and R. Lathe, “Porphyrinuria in childhood autistic disorder: Implications for environmental toxicity,” Toxicology and Applied Pharmacology, vol. 214, no. 2, 2006, pp. 99–108.
[4] R. A. Kumar, S. KaraMohamed, J. Sudi, D. F. Conrad, C. Brune, J. A. Badner, T. C. Gilliam, N. J. Nowak, J. Cook, Edwin H., W. B. Dobyns, and S. L. Christian, “Recurrent 16p11.2 microdeletions in autism,” Human Molecular Genetics, vol. 17, no. 4, 12 2007, pp. 628–638.
[5] L. A. Weiss, Y. Shen, J. M. Korn, D. E. Arking, D. T. Miller, R. Fossdal, E. Saemundsen, H. Stefansson, M. A. Ferreira, T. Green, O. S. Platt, D. M. Ruderfer, C. A. Walsh, D. Altshuler, A. Chakravarti, R. E. Tanzi, K. Stefansson, S. L. Santangelo, J. F. Gusella, P. Sklar, B.-L. Wu, and M. J. Daly, “Association between microdeletion and microduplication at 16p11.2 and autism,” New England Journal of Medicine, vol. 358, no. 7, 2008, pp. 667–675.
[6] D. A. Geier, P. G. King, L. K. Sykes, and M. R. Geier, “A comprehensive review of mercury provoked autism.” The Indian journal of medical research, vol. 128 4, 2008, pp. 383–411.
[7] M. Liu, Y. An, X. Hu, D. Langer, C. Newschaffer, and L. Shea, “An evaluation of identification of suspected autism spectrum disorder (asd) cases in early intervention (ei) records,” 2013, pp. 566–571.
[8] S. M. Manjur, M.-B. Hossain, P. A. Constable, D. A. Thompson, F. Marmolejo-Ramos, I. O. Lee, D. H. Skuse, and H. F. Posada-Quintero, “Detecting autism spectrum disorder using spectral analysis of electroretinogram and machine learning: Preliminary results,” in 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2022, pp. 3435–3438.
[9] B. S. Roopa and R. Manjunatha Prasad, “Concatenating framework in asd analysis towards research progress,” in 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE), 2019, pp. 269–271.
[10] J. F. Santos, N. Brosh, T. H. Falk, L. Zwaigenbaum, S. E. Bryson, W. Roberts, I. M. Smith, P. Szatmari, and J. A. Brian, “Very early detection of autism spectrum disorders based on acoustic analysis of pre-verbal vocalizations of 18-month old toddlers,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7567–7571.
[11] M. B. Marchelliant, Aripin, and S. A. Wulandari, “Analysis of electrocardiogram signal and ammonia concentration for clustering asd condition,” in 2021 International Seminar on Application for Technology of Information and Communication (iSemantic), 2021, pp. 290–295.
[12] G. E. Hinton and S. Roweis, “Stochastic neighbor embedding,” in Advances in Neural Information Processing Systems, S. Becker, S. Thrun, and K. Obermayer, Eds., vol. 15. MIT Press, 2002.
[13] L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, 2008, pp. 2579–2605.
[Online]. Available: http://jmlr.org/papers/v9/vandermaaten08a.html
[14] T. K. Ho, “Random decision forests,” in Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, 1995, pp. 278–282 vol.1.
[15] “Kaiser Family Foundation: Hospital Beds by State ,” https://www.kff.org, accessed on Fri, November 11, 2022.
[Online]. Available: https://www.kff.org/other/state-indicator/beds-by-ownership
[16] “Kaiser Family Foundation: State health data ,” https://www.kff.org, accessed on Fri, November 11, 2022.
[Online]. Available: https://www.kff.org/statedata/
[17] “US Census Bureau: Population and housing unit estimates,” https://www.census.gov, accessed on Fri, November 11, 2022.
[Online]. Available: https://www.census.gov/programs-surveys/popest/data/data-sets.html
[18] “Household income data by state and race,” https://www.census.gov, accessed on Fri, November 11, 2022.
[Online]. Available: https://www.census.gov/data/tables/time-series/demo/income-poverty/ historical-income-households.html
[19] “scikit-learn Machine Learning library in Python,” http://scikit-learn.org, accessed on Fri, November 11, 2022.
[Online]. Available: http://scikit-learn.org/
[20] S. V. Chakraborty and S. K. Shukla, “Predictive modeling of electricity trading prices and the impact of increasing solar energy penetration,” in 2019 IEEE Milan PowerTech, 2019, pp. 1–6.