An Efficient Framework to Build Up Malware Dataset

Madihah Mohd Saudi; Zul Hilmi Abdullah

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

An Efficient Framework to Build Up Malware Dataset

Authors: Madihah Mohd Saudi, Zul Hilmi Abdullah

Abstract:

This research paper presents a framework on how to build up malware dataset.Many researchers took longer time to clean the dataset from any noise or to transform the dataset into a format that can be used straight away for testing. Therefore, this research is proposing a framework to help researchers to speed up the malware dataset cleaningprocesses which later can be used for testing. It is believed, an efficient malware dataset cleaning processes, can improved the quality of the data, thus help to improve the accuracy and the efficiency of the subsequent analysis. Apart from that, an in-depth understanding of the malware taxonomy is also important prior and during the dataset cleaning processes. A new Trojan classification has been proposed to complement this framework.This experiment has been conducted in a controlled lab environment and using the dataset from VxHeavens dataset. This framework is built based on the integration of static and dynamic analyses, incident response method and knowledge database discovery (KDD) processes.This framework can be used as the basis guideline for malware researchers in building malware dataset.

Keywords: Dataset, knowledge database discovery (KDD), malware, static and dynamic analyses.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1086689

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3425

References:

[1] Al Shalabi, Luai., Syaaban, Zyad., & Kasasbeh, Basel. (2006). Data Mining: A Preprocessing Engine. Applied Science University, Amman, Jordan (Electronic version). (Accessed 25 March 2013).
[2] Barreno, M., Bartlett, P. L., Chi, F. J., Joseph, A. D., Nelson, B., Rubinstein, B. I., ... & Tygar, J. D. (2008, October). Open problems in the security of learning. In Proceedings of the 1st ACM workshop on Workshop on AISec(pp. 19-26). ACM.
[3] Dai, Jianyong., Guha, Ratan., & Lee, Joohan. (2009). Efficient Virus Detection Using Dynamic Instruction Sequences(Electronic version). (Accessed 29 March 2013).URL:http://www.academypublisher.com /jcp/vol04/no05/jcp0405405414.pdf
[4] Engels, Robert., Theusinger, Christiane. (1998). Using a Data Metric for Preprocessing Advice for Data Mining Applications(Electronic version).(Accessed 27 March 2013).URL:http://www.esis.no/people/ robert.engels/papers/engels_theusinger_ECAI98.pdf
[5] Graziano, M., Leita, C., & Balzarotti, D. (2012, December). Towards network containment in malware analysis systems. In Proceedings of the 28th Annual Computer Security Applications Conference (pp. 339-348). ACM.
[6] Han, J., Kamber, M.(2000). Data Preprocessing(Electronic version). (Accessed 28 March 2013).URL:http://www.cse.iitm.ac.in/~cs672/ Lectures/Data_Preprocessing.pdf.
[7] Is Linux really more secure than Windows? (2011)
[online] Available from:http://www.esecurityplanet.com/trends/article.php/3933491/Is-L inux-Really-More-Secure-than-Windows.htm (accessed 29 March 2013).
[8] Mangarae, Aelphaeis.(2006) Trojan White Paper
[Igniteds.NET], Available from: http://igniteds. (Accessed 29 March 2013).
[9] Mertz, C.J. and Murphy, P.M. (1996). UCI Repository of machine learning databases. University of California (Electronic version). Available from: http://www.ics.uci.edu/~mlearn/MLRepository.htm (Accessed 29 March 2013).
[10] Mohd Saudi, M., Cullen, A.J. and Woodward, M. (2011), Efficient StakcertKdd Processes In Worm Detection, World Academy Of Science, Engineering And Technology Journal, Issue 55, pp. 453-457.
[11] Mohd Saudi, Madihah. (2011). A New Model for Worms Detection And Response (Electronic version). (Accessed 25 March 2012).
[12] Nataraj, Lakshmanan., Yegneswaran, Vinod., Porras, Phillip., & Zhang, Jian. (2011). A Comparative Assessment of Malware Classification using Binary Texture Analysis and Dynamic Analysis(Electronic version). (Accessed 26 March 2013).URL: http://vision.ece.ucsb.edu/publications/aisec17-nataraj.pdf.
[13] Plusquellic, Jim.,(2008). Taxonomy of Trojans for IC Trust. (Electronic version). (Accessed 13 May 2012). URL: http:// www.ece.unm.edu/~jimp/HOST/papers/Trojan_taxonomy.pdf.
[14] Rajendran, Jeyavijayan., (2011). Toward a Comprehensive and Systematic Classification of Hardware Trojans. (Electronic version). (Accesses 29 March 2013).
[15] Sembiring, Rahmat Widia., & Mohamad Zain, Jasni. (2012). The Design of Pre-Processing Multidimensional Data Based on Component Analysis, Faculty of Computer System and Software Engineering, Universiti Malaysia Pahang (Electronic version). (Accessed 29 March 2013).URL: http://umpir.ump.edu.my/1204/1/new1-20110414.pdf.
[16] Shafiq, M. Zubair.,Khayam, Syed Ali., & Farooq, Muddassar. (2008). Embedded Malware Detection using Markov n-grams(Electronic version). (Accessed 29 March 2013).URL: http://nexginrc.org/ nexginrcAdmin/PublicationsFiles/dimva08-zubair.pdf.
[17] Stibor, Thomas. (2010). A Study Of Detecting Computer Viruses In Real-Infected Files in the n-gram Representation with Machine Learning Methods (Electronic version). (Accessed 29 March 2013). URL:http://www.sec.in.tum.de/assets/staff/stibor/iea.aie.final.extende d.pdf
[18] Tehranipoor, Mohammad.,Koushanfar, Farinaz, (2010). A Survey of Hardware Trojan Taxonomy and Detection. . (Electronic version). (Accessed 29 March 2013).URL: http://www.computer.org/ portal/web/computingnow/0910/theme/designandtest3
[19] Trojan Horse (2012)
[online] Available from: www.webopedia.com /TERM/T/Trojan_horse.html (Accessed 29 March 2013).
[20] Wang,Xiaoxiao.,Salmani, Hassan., Tehranipoor, Mohammad., and Plusquellic, Jim. (2008). Hardware Trojan Detection and Isolation Using Current Integration and Localized Current Analysis (Electronic version).(Accessed on 29 March 2013) .URL:http://www.ece.unm.edu /~jimp/pubs/DFT08_FINAL.pdf
[21] Witten, I. H., & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
[22] Zimmermann, Thomas., & Weißgerber, Peter. (2004). Preprocessing CVS Data for Fine-Grained Analysis (Electronic version). (Accessed 26 March 2013).URL: http://msr.uwaterloo.ca/slides/Zimmermann.pdf
[23] Rad, B. B., Masrom M., Ibrahim, S. (2011). Evaluation of Computer Virus Concealment and Antivirus Concealment and Anti-Virus Techniques: A Short Surver.International Journal of Computer Science Issues, 8(1).
[24] Gharibi, W., Mirza, Abdulrahman. Software Vulnerabilities, Banking Threats, Botnets and Malware Self-Protection Technologies.
[25] Schultz, M. G., Eskin, E., Zadok, E. and Stolfo, S. J. (2001). Data Mining Methods for Detection of New Malicious Executables. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, IEEE Computer Society, pp 38, (Accessed 26 March 2013)
[26] Henchiri, O. and Japkowicz, N. (2006). A Feature Selection and Evaluation Scheme for Computer Virus Detection. Proceedings of the Sixth International Conference on Data Mining, 2006. ICDM '06. Hong Kong: IEEE Xplore, pp. 891 - 895. Available from: http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.4(Accessed 26 March 2013)
[27] Moskovitch, R., Y. Elovici and Rokach,L. (2008a). Detection of unknown computer worms based on behavioral classification of the host. Computational Statistics & Data Analysis 52(9). pp.4544-4566.
[28] Khan, H., Mirza, F. and Khayam, S. A. (2010). Determining malicious executable distinguishing attributes and low-complexity detection. Journal in Computer Virology. 7(2), pp. 95-105
[29] Abuzaid, AM, Mohd Saudi, M. M Taib, B. & Abdullah, ZH. (2013) An Efficient Trojan Horse Classification (ETC), IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 2, No 3, March 2013, pp.96-104.