Visible to the public Enhancing Image-Based Malware Classification Using Semi-Supervised Learning

TitleEnhancing Image-Based Malware Classification Using Semi-Supervised Learning
Publication TypeConference Paper
Year of Publication2021
AuthorsAbdelmonem, Salma, Seddik, Shahd, El-Sayed, Rania, Kaseb, Ahmed S.
Conference Name2021 3rd Novel Intelligent and Leading Emerging Sciences Conference (NILES)
KeywordsCNN, Costs, Data models, Gray-scale, Human Behavior, image-based, LeNet, Malware, malware classification, Malware Visualization’ ∏-Model, Predictive Metrics, privacy, pubcrawl, Resiliency, ResNet, semi-supervised learning, Semisupervised learning, supervised learning
AbstractMalicious software (malware) creators are constantly mutating malware files in order to avoid detection, resulting in hundreds of millions of new malware every year. Therefore, most malware files are unlabeled due to the time and cost needed to label them manually. This makes it very challenging to perform malware detection, i.e., deciding whether a file is malware or not, and malware classification, i.e., determining the family of the malware. Most solutions use supervised learning (e.g., ResNet and VGG) whose accuracy degrades significantly with the lack of abundance of labeled data. To solve this problem, this paper proposes a semi-supervised learning model for image-based malware classification. In this model, malware files are represented as grayscale images, and semi-supervised learning is carefully selected to handle the plethora of unlabeled data. Our proposed model is an enhanced version of the -model, which makes it more accurate and consistent. Experiments show that our proposed model outperforms the original -model by 4% in accuracy and three other supervised models by 6% in accuracy especially when the ratio of labeled samples is as low as 20%.
DOI10.1109/NILES53778.2021.9600511
Citation Keyabdelmonem_enhancing_2021