Malware Classification Using Deep Convolutional Neural Networks

Submitted by grigby1 on Mon, 06/10/2019 - 1:01pm

Title	Malware Classification Using Deep Convolutional Neural Networks
Publication Type	Conference Paper
Year of Publication	2018
Authors	Kornish, D., Geary, J., Sansing, V., Ezekiel, S., Pearlstein, L., Njilla, L.
Conference Name	2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
Date Published	Oct. 2018
Publisher	IEEE
ISBN Number	978-1-5386-9306-3
Keywords	accuracy levels, classification, classifier, convolutional neural nets, convolutional neural network, convolutional neural networks, Correlation, DCNN classification, deep convolutional neural networks, deep learning techniques, feature extraction, Gray-scale, Hamming distance, Human Behavior, image classification, image features, image representation, image type, improved image representation, invasive software, learning (artificial intelligence), learning models, machine learning, Malware, malware binaries, malware classification, malware images, Metrics, Neural Network, object detection, Pattern recognition, privacy, pubcrawl, resilience, Resiliency, support vector machine, support vector machine classifier training, Support vector machines, transfer learning, visual patterns, visualization
Abstract	In recent years, deep convolution neural networks (DCNNs) have won many contests in machine learning, object detection, and pattern recognition. Furthermore, deep learning techniques achieved exceptional performance in image classification, reaching accuracy levels beyond human capability. Malware variants from similar categories often contain similarities due to code reuse. Converting malware samples into images can cause these patterns to manifest as image features, which can be exploited for DCNN classification. Techniques for converting malware binaries into images for visualization and classification have been reported in the literature, and while these methods do reach a high level of classification accuracy on training datasets, they tend to be vulnerable to overfitting and perform poorly on previously unseen samples. In this paper, we explore and document a variety of techniques for representing malware binaries as images with the goal of discovering a format best suited for deep learning. We implement a database for malware binaries from several families, stored in hexadecimal format. These malware samples are converted into images using various approaches and are used to train a neural network to recognize visual patterns in the input and classify malware based on the feature vectors. Each image type is assessed using a variety of learning models, such as transfer learning with existing DCNN architectures and feature extraction for support vector machine classifier training. Each technique is evaluated in terms of classification accuracy, result consistency, and time per trial. Our preliminary results indicate that improved image representation has the potential to enable more effective classification of new malware.
URL	https://ieeexplore.ieee.org/document/8707429
DOI	10.1109/AIPR.2018.8707429
Citation Key	kornish_malware_2018

Groups:

Science of Security VO