Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
DWM-CDD: Dynamic Weighted Majority Concept Drift Detection for Spam Mail Filtering

Authors: Leili Nosrati, Alireza Nemaney Pour

Abstract:

Although e-mail is the most efficient and popular communication method, unwanted and mass unsolicited e-mails, also called spam mail, endanger the existence of the mail system. This paper proposes a new algorithm called Dynamic Weighted Majority Concept Drift Detection (DWM-CDD) for content-based filtering. The design purposes of DWM-CDD are first to accurate the performance of the previously proposed algorithms, and second to speed up the time to construct the model. The results show that DWM-CDD can detect both sudden and gradual changes quickly and accurately. Moreover, the time needed for model construction is less than previously proposed algorithms.

Keywords: Concept drift, Content-based filtering, E-mail, Spammail.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1082750

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1524

References:


[1] An Osterman Research White Paper "The Advantages of Using Traffic-Shaping Techniques to Control Spam," Osterman Research, Inc., pp. 1-6, Jan. 2007.
[2] T. S. Guzella, and W. M. Caminhas, "A Review of Machine Learning Approaches to Spam Filtering," Elsevier, Expert Systems with Applications, vol. 36, no. 7, pp. 10206-10222, 2009.
[3] A. Ciltik, and T. Gungor, "Time-Efficient Spam E-mail Filtering using n-Gram Models," Pattern Recognition Letters, vol. 29, no. 1, pp. 19-33, Jan. 2008.
[4] E. Blanzieri, and A. Bryl, "A Survey of Learning-based Techniques of Email Spam Filtering," Artificial Intelligence Review, vol. 29, no.1, pp. 63-922008
[5] I. Zliobate, "Learning under Concept Drift: an Overview," Technical Report on Artificial Intelligence, Vilinios University, pp. 371-391, 2010.
[6] Q. Zhu, X. Hu, Y. Zhang, and P. Li, "A Double-Window-based Classification Algorithm for Concept Drifting Data Streams," proceedings of IEEE International Conference on Granular Computing (GrC), CA, USA, 2010, pp. 639-644.
[7] Z. Ouyang, and M. Zou, "Mining Concept-Drifting and Noisy Data Streams using Ensemble Classifiers," proceedings of IEEE International Conference on Artificial Intelligence and Computational Intelligence (AICI 2009), Shanghai, China, 2009, pp. 360-364.
[8] A. Tsymbal, "The Problem of Concept Drift: Definitions and Related Work," Technical report TCD-CS-2004-15, Trinity College Dublin, Ireland, pp.123-. 130, 2004.
[9] J.Z. Kolter, and M.A. Maloof, "Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift," Proceedings of IEEE Third International Conference on Data Mining, Washington DC, USA, 2003, pp. 123-130.
[10] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with Drift Detection," Lecture Notes in Computer Science, vol. 3171/2204, pp. 66-112, 2004.
[11] M.B. Jose, J.D.C. Avila, R. Fidalgo, A. Bifet, R. Gavalda, and R.M. Bueno, "Early Drift Detection Method," Fourth International Workshop on Knowledge Discovery from Data Streams, Berlin, Germany, 2006, pp. 77-86.