Predictive Analytics of Student Performance Determinants in Education
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32797
Predictive Analytics of Student Performance Determinants in Education

Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi


Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.

Keywords: Student performance, supervised machine learning, prediction, classification, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 465


[1] F. Benavides, "Equity and Quality in Education," 2012. (accessed May 20, 2022).
[2] B. Efron, "Prediction, Estimation, and Attribution," Journal of the American Statistical Association, vol. 115, no. 530, pp. 636–655, Apr. 2020, doi: 10.1080/01621459.2020.1762613.
[3] G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning: with applications in R. Springer, 2013.
[4] A. Seetharam Nagesh, C. Satyamurty and K. Akhila, "Predicting Student Performance Using KNN Classification in Bigdata Environment", CVR Journal of Science and Technology, vol. 13, no. 7, pp. 83-87, 2017. Available: ISSN 2277 – 3916.
[5] S. T. Jishan, R. I. Rashu, N. Haque, and R. M. Rahman, “Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique,” Decision Analytics, vol. 2, no. 1, Mar. 2015, doi: 10.1186/s40165-014-0010-2
[6] A. K. Hamoud, A. S. Hashim, and W. A. Awadh, “Predicting Student Performance in Higher Education Institutions Using Decision Tree Analysis,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 2, p. 26, 2018, doi: 10.9781/ijimai.2018.02.004.
[7] S. K. Ghosh and F. Janan, “Prediction of Student’s Performance Using Random Forest Classifier,” 11th Annual International Conference on Industrial Engineering and Operations, Sep. 2022.
[8] R. Katarya, “Review: Predicting the Performance of Students Using Machine Learning Classification Techniques,” Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). IEEE, pp. 36–41, 2019.
[9] K. Sixhaxa, A. Jadhav and R. Ajoodha, "Predicting Students Performance in Exams using Machine Learning Techniques," 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2022, pp. 635-640, doi: 10.1109/Confluence52989.2022.9734218.
[10] H. S. Liu, “Research on the College Students’ Physique Test Based on Artificial Neural Network and Support Vector Machines Approach,” Advanced Materials Research, vol. 945–949, pp. 3558–3561, Jun. 2014, doi: 10.4028/
[11] H.-H. Huang and T. Zhang, “Robust discriminant analysis using multi-directional projection pursuit,” Pattern Recognition Letters, vol. 138, pp. 651–656, Oct. 2020, doi: 10.1016/j.patrec.2020.09.013.
[12] A. Salah Hashim, W. Akeel Awadh, and A. Khalaf Hamoud, “Student Performance Prediction Model based on Supervised Machine Learning Algorithms,” IOP Conference Series: Materials Science and Engineering, vol. 928, p. 032019, Nov. 2020, doi: 10.1088/1757-899x/928/3/032019.
[13] N. Mohd Rusli, “Predicting Students’ Academic Achievement: Comparison between LogisticRegression, Artificial Neural Network, and Neuro-Fuzzy.”
[14] Y.-Y. Song and Y. Lu, "Decision tree methods: applications for classification and prediction," Shanghai archives of psychiatry, vol. 27, no. 2, pp. 130–5, 2015, doi: 10.11919/j.issn.1002-0829.215044.
[15] P. Aye, “Predicting Students Performance in the University System Using Discriminant Analysis Method,” Multidisciplinary International Journal of Research and Development, vol. 01, no. 02, 2021.
[16] K. Elkhalil, A. Kammoun, R. Couillet, T. Y. Al-Naffouri, and M.-S. Alouini, “Asymptotic Performance of Regularized Quadratic Discriminant Analysis Based Classifiers,” IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, 2017.
[17] S. Sothan, "The determinants of academic performance: evidence from a Cambodian University", Studies in Higher Education, vol. 44, no. 11, pp. 2096-2111, 2018. Available: 10.1080/03075079.2018.1496408.
[18] R. A. FISHER, "The Use of Multiple Measurements in Taxonomic Problems," Annals of Eugenics, vol. 7, no. 2, pp. 179–188, Sep. 1936, doi: 10.1111/j.1469-1809.1936.tb02137.x.
[19] A. Abraham and Swagatam Das, Computational Intelligence in Power Engineering. Berlin Springer Berlin, 201.
[20] "EDA on Higher Education Students Performance,", 2019., Accessed: May 20, 2022.
[21] J. Chain, "A Multilevel Analysis of Student, Family, and School Factors Associated with Latino/a Parental Involvement in the Middle School Learning Environment,", Nov. 2016, Accessed: May 20, 2022. (Online). Available:
[22] M. Saarela and S. Jauhiainen, “Comparison of feature importance measures as explanations for classification models,” SN Applied Sciences, vol. 3, no. 2, Feb. 2021, doi: 10.1007/s42452-021-04148-9.
[23] A. J. Ferreira and M. A. T. Figueiredo, "Efficient feature selection filters for high-dimensional data," Pattern Recognition Letters, vol. 33, no. 13, pp. 1794–1804, Oct. 2012, doi: 10.1016/j.patrec.2012.05.019 .
[24] Z. Zhao and H. Liu, "Spectral Feature Selection for Supervised and Unsupervised Learning," Proceedings of the 24 th International Conference on Machine Learning, 2007. (accessed May 20, 2022).
[25] J. W. Nagge, "Regarding the Law of Parsimony," The Pedagogical Seminary and Journal of Genetic Psychology, vol. 41, no. 2, pp. 492–494, Dec. 1932, doi: 10.1080/08856559.1932.10533115.
[26] M. Wallowa Mwadulo, "A Review on Feature Selection Methods For Classification Tasks," International Journal of Computer Applications Technology and Research, vol. 5, no. 6, pp. 395–402, Jun. 2016, doi: 10.7753/ijcatr0506.1013.
[27] I. Guyon and A. De, "An Introduction to Variable and Feature Selection André Elisseeff," Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003, (Online). Available: (accessed May 20, 2022).
[28] G. Doquire and M. Verleysen, "An Hybrid Approach to Feature Selection for Mixed Categorical and Continuous Data," Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, 2011, doi: 10.5220/0003634903860393. (accessed May 20, 2022).
[29] M. Behzad, K. Asghari, M. Eazi, and M. Palhang, “Generalization performance of support vector machines and neural networks in runoff modeling,” Expert Systems with Applications, vol. 36, no. 4, pp. 7624–7629, May 2009, doi: 10.1016/j.eswa.2008.09.053.
[30] Descriptive analysis table, 2022. (accessed Aug 12, 2022)