Ben Abdel Ouahab, Ikram, Elaachak, Lotfi, Alluhaidan, Yasser A., Bouhorma, Mohammed.
2021.
A new approach to detect next generation of malware based on machine learning. 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). :230–235.
In these days, malware attacks target different kinds of devices as IoT, mobiles, servers even the cloud. It causes several hardware damages and financial losses especially for big companies. Malware attacks represent a serious issue to cybersecurity specialists. In this paper, we propose a new approach to detect unknown malware families based on machine learning classification and visualization technique. A malware binary is converted to grayscale image, then for each image a GIST descriptor is used as input to the machine learning model. For the malware classification part we use 3 machine learning algorithms. These classifiers are so efficient where the highest precision reach 98%. Once we train, test and evaluate models we move to simulate 2 new malware families. We do not expect a good prediction since the model did not know the family; however our goal is to analyze the behavior of our classifiers in the case of new family. Finally, we propose an approach using a filter to know either the classification is normal or it's a zero-day malware.
Osman, Mohd Zamri, Abidin, Ahmad Firdaus Zainal, Romli, Rahiwan Nazar, Darmawan, Mohd Faaizie.
2021.
Pixel-based Feature for Android Malware Family Classification using Machine Learning Algorithms. 2021 International Conference on Software Engineering Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM). :552–555.
‘Malicious software’ or malware has been a serious threat to the security and privacy of all mobile phone users. Due to the popularity of smartphones, primarily Android, this makes them a very viable target for spreading malware. In the past, many solutions have proved ineffective and have resulted in many false positives. Having the ability to identify and classify malware will help prevent them from spreading and evolving. In this paper, we study the effectiveness of the proposed classification of the malware family using a pixel level as features. This study has implemented well-known machine learning and deep learning classifiers such as K-Nearest Neighbours (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree, and Random Forest. A binary file of 25 malware families is converted into a fixed grayscale image. The grayscale images were then extracted transforming the size 100x100 into a single format into 100000 columns. During this phase, none of the columns are removed as to remain the patterns in each malware family. The experimental results show that our approach achieved 92% accuracy in Random Forest, 88% in SVM, 81% in Decision Tree, 80% in k-NN and 56% in Naïve Bayes classifier. Overall, the pixel-based feature also reveals a promising technique for identifying the family of malware with great accuracy, especially using the Random Forest classifier.
Singh, Shirish, Kaiser, Gail.
2021.
Metamorphic Detection of Repackaged Malware. 2021 IEEE/ACM 6th International Workshop on Metamorphic Testing (MET). :9–16.
Machine learning-based malware detection systems are often vulnerable to evasion attacks, in which a malware developer manipulates their malicious software such that it is misclassified as benign. Such software hides some properties of the real class or adopts some properties of a different class by applying small perturbations. A special case of evasive malware hides by repackaging a bonafide benign mobile app to contain malware in addition to the original functionality of the app, thus retaining most of the benign properties of the original app. We present a novel malware detection system based on metamorphic testing principles that can detect such benign-seeming malware apps. We apply metamorphic testing to the feature representation of the mobile app, rather than to the app itself. That is, the source input is the original feature vector for the app and the derived input is that vector with selected features removed. If the app was originally classified benign, and is indeed benign, the output for the source and derived inputs should be the same class, i.e., benign, but if they differ, then the app is exposed as (likely) malware. Malware apps originally classified as malware should retain that classification, since only features prevalent in benign apps are removed. This approach enables the machine learning model to classify repackaged malware with reasonably few false negatives and false positives. Our training pipeline is simpler than many existing ML-based malware detection methods, as the network is trained end-to-end to jointly learn appropriate features and to perform classification. We pre-trained our classifier model on 3 million apps collected from the widely-used AndroZoo dataset.1 We perform an extensive study on other publicly available datasets to show our approach's effectiveness in detecting repackaged malware with more than 94% accuracy, 0.98 precision, 0.95 recall, and 0.96 F1 score.
Khetarpal, Anavi, Mallik, Abhishek.
2021.
Visual Malware Classification Using Transfer Learning. 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT). :1–5.
The proliferation of malware attacks causes a hindrance to cybersecurity thus, posing a significant threat to our devices. The variety and number of both known as well as unknown malware makes it difficult to detect it. Research suggests that the ramifications of malware are only becoming worse with time and hence malware analysis becomes crucial. This paper proposes a visual malware classification technique to convert malware executables into their visual representations and obtain grayscale images of malicious files. These grayscale images are then used to classify malicious files into their respective malware families by passing them through deep convolutional neural networks (CNN). As part of deep CNN, we use various ImageNet models and compare their performance.
Wang, Shuwei, Wang, Qiuyun, Jiang, Zhengwei, Wang, Xuren, Jing, Rongqi.
2021.
A Weak Coupling of Semi-Supervised Learning with Generative Adversarial Networks for Malware Classification. 2020 25th International Conference on Pattern Recognition (ICPR). :3775–3782.
Malware classification helps to understand its purpose and is also an important part of attack detection. And it is also an important part of discovering attacks. Due to continuous innovation and development of artificial intelligence, it is a trend to combine deep learning with malware classification. In this paper, we propose an improved malware image rescaling algorithm (IMIR) based on local mean algorithm. Its main goal of IMIR is to reduce the loss of information from samples during the process of converting binary files to image files. Therefore, we construct a neural network structure based on VGG model, which is suitable for image classification. In the real world, a mass of malware family labels are inaccurate or lacking. To deal with this situation, we propose a novel method to train the deep neural network by Semi-supervised Generative Adversarial Network (SGAN), which only needs a small amount of malware that have accurate labels about families. By integrating SGAN with weak coupling, we can retain the weak links of supervised part and unsupervised part of SGAN. It improves the accuracy of malware classification by making classifiers more independent of discriminators. The results of experimental demonstrate that our model achieves exhibiting favorable performance. The recalls of each family in our data set are all higher than 93.75%.
Kumar, Shashank, Meena, Shivangi, Khosla, Savya, Parihar, Anil Singh.
2021.
AE-DCNN: Autoencoder Enhanced Deep Convolutional Neural Network For Malware Classification. 2021 International Conference on Intelligent Technologies (CONIT). :1–5.
Malware classification is a problem of great significance in the domain of information security. This is because the classification of malware into respective families helps in determining their intent, activity, and level of threat. In this paper, we propose a novel deep learning approach to malware classification. The proposed method converts malware executables into image-based representations. These images are then classified into different malware families using an autoencoder enhanced deep convolutional neural network (AE-DCNN). In particular, we propose a novel training mechanism wherein a DCNN classifier is trained with the help of an encoder. We conjecture that using an encoder in the proposed way provides the classifier with the extra information that is perhaps lost during the forward propagation, thereby leading to better results. The proposed approach eliminates the use of feature engineering, reverse engineering, disassembly, and other domain-specific techniques earlier used for malware classification. On the standard Malimg dataset, we achieve a 10-fold cross-validation accuracy of 99.38% and F1-score of 99.38%. Further, due to the texture-based analysis of malware files, the proposed technique is resilient to several obfuscation techniques.
Abdelmonem, Salma, Seddik, Shahd, El-Sayed, Rania, Kaseb, Ahmed S..
2021.
Enhancing Image-Based Malware Classification Using Semi-Supervised Learning. 2021 3rd Novel Intelligent and Leading Emerging Sciences Conference (NILES). :125–128.
Malicious software (malware) creators are constantly mutating malware files in order to avoid detection, resulting in hundreds of millions of new malware every year. Therefore, most malware files are unlabeled due to the time and cost needed to label them manually. This makes it very challenging to perform malware detection, i.e., deciding whether a file is malware or not, and malware classification, i.e., determining the family of the malware. Most solutions use supervised learning (e.g., ResNet and VGG) whose accuracy degrades significantly with the lack of abundance of labeled data. To solve this problem, this paper proposes a semi-supervised learning model for image-based malware classification. In this model, malware files are represented as grayscale images, and semi-supervised learning is carefully selected to handle the plethora of unlabeled data. Our proposed model is an enhanced version of the ∏-model, which makes it more accurate and consistent. Experiments show that our proposed model outperforms the original ∏-model by 4% in accuracy and three other supervised models by 6% in accuracy especially when the ratio of labeled samples is as low as 20%.
Lee, Shan-Hsin, Lan, Shen-Chieh, Huang, Hsiu-Chuan, Hsu, Chia-Wei, Chen, Yung-Shiu, Shieh, Shiuhpyng.
2021.
EC-Model: An Evolvable Malware Classification Model. 2021 IEEE Conference on Dependable and Secure Computing (DSC). :1–8.
Malware evolves quickly as new attack, evasion and mutation techniques are commonly used by hackers to build new malicious malware families. For malware detection and classification, multi-class learning model is one of the most popular machine learning models being used. To recognize malicious programs, multi-class model requires malware types to be predefined as output classes in advance which cannot be dynamically adjusted after the model is trained. When a new variant or type of malicious programs is discovered, the trained multi-class model will be no longer valid and have to be retrained completely. This consumes a significant amount of time and resources, and cannot adapt quickly to meet the timely requirement in dealing with dynamically evolving malware types. To cope with the problem, an evolvable malware classification deep learning model, namely EC-Model, is proposed in this paper which can dynamically adapt to new malware types without the need of fully retraining. Consequently, the reaction time can be significantly reduced to meet the timely requirement of malware classification. To our best knowledge, our work is the first attempt to adopt multi-task, deep learning for evolvable malware classification.
Gao, Tan, Li, Xudong, Chen, Wen.
2021.
Co-training For Image-Based Malware Classification. 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). :568–572.
A malware detection model based on semi-supervised learning is proposed in the paper. Our model includes mainly three parts: malware visualization, feature extraction, and classification. Firstly, the malware visualization converts malware into grayscale images; then the features of the images are extracted to reflect the coding patterns of malware; finally, a collaborative learning model is applied to malware detections using both labeled and unlabeled software samples. The proposed model was evaluated based on two commonly used benchmark datasets. The results demonstrated that compared with traditional methods, our model not only reduced the cost of sample labeling but also improved the detection accuracy through incorporating unlabeled samples into the collaborative learning process, thereby achieved higher classification performance.
Keyes, David Sean, Li, Beiqi, Kaur, Gurdip, Lashkari, Arash Habibi, Gagnon, Francois, Massicotte, Frédéric.
2021.
EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics. 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS). :1–12.
The unmatched threat of Android malware has tremendously increased the need for analyzing prominent malware samples. There are remarkable efforts in static and dynamic malware analysis using static features and API calls respectively. Nonetheless, there is a void to classify Android malware by analyzing its behavior using multiple dynamic characteristics. This paper proposes EntropLyzer, an entropy-based behavioral analysis technique for classifying the behavior of 12 eminent Android malware categories and 147 malware families taken from CCCS-CIC-AndMal2020 dataset. This work uses six classes of dynamic characteristics including memory, API, network, logcat, battery, and process to classify and characterize Android malware. Results reveal that the entropy-based analysis successfully determines the behavior of all malware categories and most of the malware families before and after rebooting the emulator.
Or-Meir, Ori, Cohen, Aviad, Elovici, Yuval, Rokach, Lior, Nissim, Nir.
2021.
Pay Attention: Improving Classification of PE Malware Using Attention Mechanisms Based on System Call Analysis. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.
Malware poses a threat to computing systems worldwide, and security experts work tirelessly to detect and classify malware as accurately and quickly as possible. Since malware can use evasion techniques to bypass static analysis and security mechanisms, dynamic analysis methods are more useful for accurately analyzing the behavioral patterns of malware. Previous studies showed that malware behavior can be represented by sequences of executed system calls and that machine learning algorithms can leverage such sequences for the task of malware classification (a.k.a. malware categorization). Accurate malware classification is helpful for malware signature generation and is thus beneficial to antivirus vendors; this capability is also valuable to organizational security experts, enabling them to mitigate malware attacks and respond to security incidents. In this paper, we propose an improved methodology for malware classification, based on analyzing sequences of system calls invoked by malware in a dynamic analysis environment. We show that adding an attention mechanism to a LSTM model improves accuracy for the task of malware classification, thus outperforming the state-of-the-art algorithm by up to 6%. We also show that the transformer architecture can be used to analyze very long sequences with significantly lower time complexity for training and prediction. Our proposed method can serve as the basis for a decision support system for security experts, for the task of malware categorization.