Biblio
Nowadays, the digitization of the world is under a serious threat due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively become an obsolete method. The efficiency of the machine learning techniques in context to the detection of malwares has been proved by state-of-the-art research works. In this paper, we have proposed a framework to detect and classify different files (e.g., exe, pdf, php, etc.) as benign and malicious using two level classifier namely, Macro (for detection of malware) and Micro (for classification of malware files as a Trojan, Spyware, Ad-ware, etc.). Our solution uses Cuckoo Sandbox for generating static and dynamic analysis report by executing the sample files in the virtual environment. In addition, a novel feature extraction module has been developed which functions based on static, behavioral and network analysis using the reports generated by the Cuckoo Sandbox. Weka Framework is used to develop machine learning models by using training datasets. The experimental results using the proposed framework shows high detection rate and high classification rate using different machine learning algorithms
Nowadays, Malware has become a serious threat to the digitization of the world due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively becomes an obsolete method. The efficiency of the machine learning model in context to the detection of malware files has been proved by different researches and studies. In this paper, a framework has been developed to detect and classify different files (e.g exe, pdf, php, etc.) as benign and malicious using two level classifier namely, Macro (for detection of malware) and Micro (for classification of malware files as a Trojan, Spyware, Adware, etc.). Cuckoo Sandbox is used for generating static and dynamic analysis report by executing files in the virtual environment. In addition, a novel model is developed for extracting features based on static, behavioral and network analysis using analysis report generated by the Cuckoo Sandbox. Weka Framework is used to develop machine learning models by using training datasets. The experimental results using proposed framework shows high detection rate with an accuracy of 100% using J48 Decision tree model, 99% using SMO (Sequential Minimal Optimization) and 97% using Random Forest tree. It also shows effective classification rate with accuracy 100% using J48 Decision tree, 91% using SMO and 66% using Random Forest tree. These results are used for detecting and classifying unknown files as benign or malicious.