A Comparative Study of Malware Detection Techniques Using Machine Learning Methods

Cristina Vatamanu; Doina Cosovan; Dragoş Gavriluţ; Henri Luchian

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

A Comparative Study of Malware Detection Techniques Using Machine Learning Methods

Authors: Cristina Vatamanu, Doina Cosovan, Dragoş Gavriluţ, Henri Luchian

Abstract:

In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through (semi)-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.

Keywords: Detection Rate, False Positives, Perceptron, One Side Class, Ensembles, Decision Tree, Hybrid methods, Feature Selection.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1100939

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3285

References:

[1] Mihai Cimpoesu, Dragos Gavrilut, and Adrian Popescu. The proactivity of perceptron derived algorithms in malware detection. Journal in Computer Virology, 8(4):133–140, 2012.
[2] Pedro Domingos. Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, 1999, pages 155–164, 1999.
[3] Dragos Gavrilut, Razvan Benchea, and Cristina Vatamanu. Optimized zero false positives perceptron training for malware detection. In 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2012, Timisoara, Romania, September 26-29, 2012, pages 247–253, 2012.
[4] Dragos Gavrilut, Mihai Cimpoesu, Dan Anton, and Liviu Ciortuz. Malware detection using machine learning. In Proceedings of the International Multiconference on Computer Science and Information Technology, IMCSIT 2009, Mragowo, Poland, 12-14 October 2009, pages 735–741, 2009.
[5] Yongtao Hu, Liang Chen, Ming Xu, Ning Zheng, and Yanhua Guo. Unknown malicious executables detection based on run-time behavior. In Fifth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2008, 18-20 October 2008, Jinan, Shandong, China, Proceedings, Volume 4, pages 391–395, 2008.
[6] Aleksander Kocz and Joshua Alspector. Svm-based filtering of e-mail spam with content-specific misclassification costs. In IN PROCEEDINGS OF THE WORKSHOP ON TEXT MINING (TEXTDM2001, 2001.
[7] Jeremy Z. Kolter and Marcus A. Maloof. Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research, 6:2721–2744, 2006.
[8] Yi-Bin Lu, Shu-Chang Din, Chao-Fu Zheng, and Bai-Jian Gao. Using multi-feature and classifier ensembles to improve malware detection. Journal of C.C.I.T., 39(2), 2010.
[9] Thomas R. Lynam, Gordon V. Cormack, and David R. Cheriton. On-line spam filter fusion. In SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006, pages 123–130, 2006.
[10] Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. Improving malware detection by applying multi-inducer ensemble. Computational Statistics & Data Analysis, 53(4):1483–1494, 2009.
[11] Robert Moskovitch, Yuval Elovici, and Lior Rokach. Detection of unknown computer worms based on behavioral classification of the host. Computational Statistics & Data Analysis, 52(9):4544–4566, 2008.
[12] Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev, and Yuval Elovici. Unknown malcode detection using OPCODE representation. In Intelligence and Security Informatics, First European Conference, EuroISI 2008, Esbjerg, Denmark, December 3-5, 2008. Proceedings, pages 204–215, 2008.
[13] Mehmet Ozdemir and Ibrahim Sogukpinar. An android malware detection architecture based on ensemble learning. Transactions on Machine Learning and Artificial Intelligence, 2(3), 2014.
[14] Matthew G. Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J. Stolfo. Data mining methods for detection of new malicious executables. In 2001 IEEE Symposium on Security and Privacy, Oakland, California, USA May 14-16, 2001, pages 38–49, 2001.
[15] Dong-Her Shih, Hsiu-Sen Chiang, and David C. Yen. Classification methods in the detection of new malicious emails. Inf. Sci., 172(1-2):241–261, 2005.
[16] Konstantin Tretyakov. Machine learning techniques in spam filtering. Data Mining Problem-oriented Seminar, 3(177):60–79, 2004.
[17] Wen-tau Yih, Joshua Goodman, and Geoff Hulten. Learning at low false positive rates. In CEAS 2006 - The Third Conference on Email and Anti-Spam, July 27-28, 2006, Mountain View, California, USA, 2006.
[18] Boyun Zhang, Jianping Yin, Jingbo Hao, Dingxing Zhang, and Shulin Wang. Malicious codes detection based on ensemble learning. In Autonomic and Trusted Computing, 4th International Conference, ATC 2007, Hong Kong, China, July 11-13, 2007, Proceedings, pages 468–477, 2007.