An Application for Risk of Crime Prediction Using Machine Learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
An Application for Risk of Crime Prediction Using Machine Learning

Authors: Luis Fonseca, Filipe Cabral Pinto, Susana Sargento


The increase of the world population, especially in large urban centers, has resulted in new challenges particularly with the control and optimization of public safety. Thus, in the present work, a solution is proposed for the prediction of criminal occurrences in a city based on historical data of incidents and demographic information. The entire research and implementation will be presented start with the data collection from its original source, the treatment and transformations applied to them, choice and the evaluation and implementation of the Machine Learning model up to the application layer. Classification models will be implemented to predict criminal risk for a given time interval and location. Machine Learning algorithms such as Random Forest, Neural Networks, K-Nearest Neighbors and Logistic Regression will be used to predict occurrences, and their performance will be compared according to the data processing and transformation used. The results show that the use of Machine Learning techniques helps to anticipate criminal occurrences, which contributed to the reinforcement of public security. Finally, the models were implemented on a platform that will provide an API to enable other entities to make requests for predictions in real-time. An application will also be presented where it is possible to show criminal predictions visually.

Keywords: Crime prediction, machine learning, public safety, smart city.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1212


[1] U. Nations, “68% of the world population projected to live in urban areas by 2050, says un — un desa — united nations department of economic and social affairs,” news/population/2018-revision-of-world-urbanization-prospects.html, May 2018, (Accessed on 26/10/2019).
[2] G. T. Database, “Incidents over time,”, 12 2018, (Accessed on 04/05/2020).
[3] Y. Wu, W. Zhang, J. Shen, Z. Mo, and Y. Peng, “Smart city with Chinese characteristics against the background of big data: Idea, action and risk,” Journal of Cleaner Production, vol. 173, pp. 60–66, 2018.
[Online]. Available:
[4] M. Mohammadi and A. Al-Fuqaha, “Enabling Cognitive Smart Cities Using Big Data and Machine Learning: Approaches and Challenges,” IEEE Communications Magazine, vol. 56, no. 2, pp. 94–101, 2018.
[5] M. S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, and A. P. Sheth, “Machine learning for internet of things data analysis: a survey,” Digital Communications and Networks, vol. 4, no. 3, pp. 161–175, 2018.
[6] L. McClendon and N. Meghanathan, “Using Machine Learning Algorithms to Analyze Crime Data,” Machine Learning and Applications: An International Journal, vol. 2, no. 1, pp. 1–12, 2015.
[7] Y. L. Lin, T. Y. Chen, and L. C. Yu, “Using Machine Learning to Assist Crime Prevention,” Proceedings - 2017 6th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2017, pp. 1029–1030, 2017.
[8] S. K. Rumi, K. Deng, and F. D. Salim, “Crime event prediction with dynamic features,” EPJ Data Science, vol. 7, no. 1, 2018.
[Online]. Available:
[9] DataSF, “Datasf — office of the chief data officer — city and county of san francisco,”, 10 2020, (Accessed on 10/12/2020).
[10] M. R. Berthold and K. P. Huber, “MISSING VALUES AND LEARNING OF FUZZY RULES,” vol. 6, no. 1998, pp. 171–178, 1998.
[11] DataSF, “Analysis neighborhoods - 2010 census tracts assigned to neighborhoods — datasf — city and county of san francisco,” Analysis-Neighborhoods-2010-census-tracts-assigned/bwbp-wk3r/, 10 2020, (Accessed on 10/12/2020).
[12] U. S. Census, “ francisco income,” 20income, 10 2020, (Accessed on 10/12/2020).
[13] ——, “ francisco age&tid=acsst1y2019.s0101,” q=san%20francisco%20age&tid=ACSST1Y2019.S0101, 10 2020, (Accessed on 10/12/2020).
[14] ——, “ francisco population&tid=acsdp1y2019.dp05,” table?q=san%20francisco%20population&tid=ACSDP1Y2019.DP05, 10 2020, (Accessed on 10/12/2020).
[15] L. A. Shalabi, R. Mahmod, A. Azim, A. Ghani, and Y. M. Saman, “A New Model for Extracting a Classifactory Knowledge from Large Datasets Using Rough Set Approach A New Model For Extracting A Classifactory Knowledge From Large Datasets Using Rough Set Approach,” no. January 1999, 1999.
[16] S. Learn, “6.3. preprocessing data — scikit-learn 0.23.2 documentation,”, 10 2020, (Accessed on 10/12/2020).
[17] imbalanced-learn API, “imbalanced-learn api — imbalanced-learn 0.5.0 documentation,” api.html, 10 2020, (Accessed on 10/12/2020).
[18] S. Learn, “3.1. cross-validation: evaluating estimator performance — scikit-learn 0.23.2 documentation,” modules/cross validation.html, 10 2020, (Accessed on 10/12/2020).