Biblio
As malware family classification methods, image-based classification methods have attracted much attention. Especially, due to the fast classification speed and the high classification accuracy, Convolutional Neural Network (CNN)-based malware family classification methods have been studied. However, previous studies on CNN-based classification methods focused only on improving the classification accuracy of malware families. That is, previous studies did not consider the cases that the accuracy of CNN-based malware classification methods can be decreased under the existence of adversarial attacks. In this paper, we analyze the robustness of various CNN-based malware family classification models under adversarial attacks. While adding imperceptible non-random perturbations to the input image, we measured how the accuracy of the CNN-based malware family classification model can be affected. Also, we showed the influence of three significant visualization parameters(i.e., the size of input image, dimension of input image, and conversion color of a special character)on the accuracy variation under adversarial attacks. From the evaluation results using the Microsoft malware dataset, we showed that even the accuracy over 98% of the CNN-based malware family classification method can be decreased to less than 7%.
In this work, we applied a deep Convolutional Neural Network (CNN) with Xception model to perform malware image classification. The Xception model is a recently developed special CNN architecture that is more powerful with less over- fitting problems than the current popular CNN models such as VGG16. However only a few use cases of the Xception model can be found in literature, and it has never been used to solve the malware classification problem. The performance of our approach was compared with other methods including KNN, SVM, VGG16 etc. The experiments on two datasets (Malimg and Microsoft Malware Dataset) demonstrated that the Xception model can achieve the highest training accuracy than all other approaches including the champion approach, and highest validation accuracy than all other approaches including VGG16 model which are using image-based malware classification (except the champion solution as this information was not provided). Additionally, we proposed a novel ensemble model to combine the predictions from .bytes files and .asm files, showing that a lower logloss can be achieved. Although the champion on the Microsoft Malware Dataset achieved a bit lower logloss, our approach does not require any features engineering, making it more effective to adapt to any future evolution in malware, and very much less time consuming than the champion's solution.