Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32919
Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection

Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi


It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, hybrid, filter-wrapper, phishing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 138


[1] Abdulrahaman, M. D., Alhanssan, J. K., Adebayo, Oyeniyi, J. A., and Olalere, M. (2019). Phishing Attack Detection Based on Random Forest with Wrapper Feature Selection Method. International Journal of Information Processing and Communication (IJIPC) Vol. 7 No. 2, Pp. 209-224
[2] Aggarwal, S., Kumar, V., & Sudarsan, S. D. (2015). Identification and Detection of Phishing Emails using Natural Language Processing Techniques. In Proceedings of the 7th International Conference on Security of Information and Networks (Pp. 217-222).
[3] Ahmed, D. S., Hussein, K. Q and Allah, H. A (2022). Phishing Websites Detection Model based on Decision Tree Algorithm and Best Feature Selection Method. Turkish Journal of Computer and Mathematics Education. Vol.13 No. 01(2022), 100-107
[4] Akarsh, T. and Elhoseny, P. E (2019). Phishing Email Detection Based on Structural Properties. In NYS Cyber Security Conference, Pp. 1-7).
[5] Akinyelu, A. A, and Adewumi, A. O., (2014). Classification of Phishing Email Using Random Forest Machine Learning Technique. Journal of Applied Mathematics. Volume 2014, Article ID 425731, 6 pages
[6] Alauthman, j. K. (2020). A Framework for Big Data Analysis in Smart Cities. In: International Conference on Advanced Machine Learning Technologies and Applications. Springer, Cham, Pp. 405–414
[7] Al-Saaidah, K. J. (2017). Leveraging Machine Learning Techniques for Windows Ransomware Network Traffic Detection. In Cyber Threat Intelligence (pp. 93-106). Springer, Cham.
[8] Mohammad, S. M. (2020). Sentiment “Analysis of Mail and Books”. Technical report, National Research Council Canada.
[9] Fariska, H. F. (2019). “Phishing Attacks: Information Flow and Chokepoints,” in Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft, M. Jakobsson and S. Myers, Eds., pp. 31–64, John Wiley & Sons, New York, NY, USA.