Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Thanh Nguyen; Andrei Doncescu; Pierre Siegel

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33126

Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1124533

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1505

References:

[1] S. M. Weiss and N. Indurkhya. Predictive data mining practical guide. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998.
[2] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big data: The next frontier for innovation, competition and productivity. Technical report, McKinsey Global Institute, May 2011.
[3] Jiban K Pal , Usefulness and applications of data mining in extra cting information from different perspectives, Annals of Library and Information Studies, Vol. 58, March 2011, pp. 7-16
[4] http://www.radicati.com/wp/wp-content/uploads/2012/10/Email Market -2012-2016-Executive-Summary.pdf.
[5] Data Mining/ Data Warehousing Mosud Y. Olumoye Lagos State Polytechnic, S.P.T.S.A. & Director of Operations, Fiatcom Nig. Ltd. Nigeria.
[6] G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991.
[7] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, 2001.
[8] Sita Gupta, Vinod Todwal, Web Data Mining & Applications, nternational Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 –8958, Volume-1, Issue-3, February 2012.
[9] Data mining classification Fabriciovoznika Leonardoviana
[10] George Dimitoglou, James A. Adams, and Carol M. Jim, Comparison of the C4.5 and a Naive Bayes Classifier for the Prediction of Lung Cancer Survivability.
[11] Seongwook Youn, Dennis McLeod, A Comparative Study for Email Classification.
[12] Yoav Freund and Llew Mason. The Alternating Decision Tree Algorithm. Proceedings of the 16th International Conference on Machine Learning, pages 124-133 (1999).
[13] Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby, Optimizing the Induction of Alternating Decision Trees, Proceedings of the Fifth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2001, pp. 477-487.
[14] Anshul Goyal and Rajni Mehta, Performance Comparison of Naïve Bayes and J48 Classification Algorithms, International Journal of Applied Engineering Research, ISSN 0973-4562 Vol.7 No.11 (2012).
[15] Tina R. Patil, Mrs. S.S. Sherekar, Performance Analysis of Naïve Bayes and J48 Classification Algorithm for Data Classification, Internationl Jpournal of Computer Science And Applications, Vol. 6, No.2, Apr 2013.
[16] Xiang yang Li, Nong Ye, A Supervised Clustering and Classification Algorithm for Mining Data With Mixed Variables, IEEE Transactions on Systems, man, and Cybernetics, Vol. 36, No. 2, 2006, pp. 396-406.
[17] https://archive.ics.uci.edu/ml/datasets/Spambase (Accessed online on January 2016).
[18] http://archive.ics.uci.edu/ml/. (Accessed online on January 2016).