Visible to the public Biblio

Filters: Keyword is Microsoft malware dataset  [Clear All Filters]
2021-09-21
bin Asad, Ashub, Mansur, Raiyan, Zawad, Safir, Evan, Nahian, Hossain, Muhammad Iqbal.  2020.  Analysis of Malware Prediction Based on Infection Rate Using Machine Learning Techniques. 2020 IEEE Region 10 Symposium (TENSYMP). :706–709.
In this modern, technological age, the internet has been adopted by the masses. And with it, the danger of malicious attacks by cybercriminals have increased. These attacks are done via Malware, and have resulted in billions of dollars of financial damage. This makes the prevention of malicious attacks an essential part of the battle against cybercrime. In this paper, we are applying machine learning algorithms to predict the malware infection rates of computers based on its features. We are using supervised machine learning algorithms and gradient boosting algorithms. We have collected a publicly available dataset, which was divided into two parts, one being the training set, and the other will be the testing set. After conducting four different experiments using the aforementioned algorithms, it has been discovered that LightGBM is the best model with an AUC Score of 0.73926.
2020-10-29
Choi, Seok-Hwan, Shin, Jin-Myeong, Liu, Peng, Choi, Yoon-Ho.  2019.  Robustness Analysis of CNN-based Malware Family Classification Methods Against Various Adversarial Attacks. 2019 IEEE Conference on Communications and Network Security (CNS). :1—6.

As malware family classification methods, image-based classification methods have attracted much attention. Especially, due to the fast classification speed and the high classification accuracy, Convolutional Neural Network (CNN)-based malware family classification methods have been studied. However, previous studies on CNN-based classification methods focused only on improving the classification accuracy of malware families. That is, previous studies did not consider the cases that the accuracy of CNN-based malware classification methods can be decreased under the existence of adversarial attacks. In this paper, we analyze the robustness of various CNN-based malware family classification models under adversarial attacks. While adding imperceptible non-random perturbations to the input image, we measured how the accuracy of the CNN-based malware family classification model can be affected. Also, we showed the influence of three significant visualization parameters(i.e., the size of input image, dimension of input image, and conversion color of a special character)on the accuracy variation under adversarial attacks. From the evaluation results using the Microsoft malware dataset, we showed that even the accuracy over 98% of the CNN-based malware family classification method can be decreased to less than 7%.

Lo, Wai Weng, Yang, Xu, Wang, Yapeng.  2019.  An Xception Convolutional Neural Network for Malware Classification with Transfer Learning. 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS). :1—5.

In this work, we applied a deep Convolutional Neural Network (CNN) with Xception model to perform malware image classification. The Xception model is a recently developed special CNN architecture that is more powerful with less over- fitting problems than the current popular CNN models such as VGG16. However only a few use cases of the Xception model can be found in literature, and it has never been used to solve the malware classification problem. The performance of our approach was compared with other methods including KNN, SVM, VGG16 etc. The experiments on two datasets (Malimg and Microsoft Malware Dataset) demonstrated that the Xception model can achieve the highest training accuracy than all other approaches including the champion approach, and highest validation accuracy than all other approaches including VGG16 model which are using image-based malware classification (except the champion solution as this information was not provided). Additionally, we proposed a novel ensemble model to combine the predictions from .bytes files and .asm files, showing that a lower logloss can be achieved. Although the champion on the Microsoft Malware Dataset achieved a bit lower logloss, our approach does not require any features engineering, making it more effective to adapt to any future evolution in malware, and very much less time consuming than the champion's solution.