Ben Abdel Ouahab, Ikram, Elaachak, Lotfi, Alluhaidan, Yasser A., Bouhorma, Mohammed.
2021.
A new approach to detect next generation of malware based on machine learning. 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). :230–235.
In these days, malware attacks target different kinds of devices as IoT, mobiles, servers even the cloud. It causes several hardware damages and financial losses especially for big companies. Malware attacks represent a serious issue to cybersecurity specialists. In this paper, we propose a new approach to detect unknown malware families based on machine learning classification and visualization technique. A malware binary is converted to grayscale image, then for each image a GIST descriptor is used as input to the machine learning model. For the malware classification part we use 3 machine learning algorithms. These classifiers are so efficient where the highest precision reach 98%. Once we train, test and evaluate models we move to simulate 2 new malware families. We do not expect a good prediction since the model did not know the family; however our goal is to analyze the behavior of our classifiers in the case of new family. Finally, we propose an approach using a filter to know either the classification is normal or it's a zero-day malware.
Osman, Mohd Zamri, Abidin, Ahmad Firdaus Zainal, Romli, Rahiwan Nazar, Darmawan, Mohd Faaizie.
2021.
Pixel-based Feature for Android Malware Family Classification using Machine Learning Algorithms. 2021 International Conference on Software Engineering Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM). :552–555.
‘Malicious software’ or malware has been a serious threat to the security and privacy of all mobile phone users. Due to the popularity of smartphones, primarily Android, this makes them a very viable target for spreading malware. In the past, many solutions have proved ineffective and have resulted in many false positives. Having the ability to identify and classify malware will help prevent them from spreading and evolving. In this paper, we study the effectiveness of the proposed classification of the malware family using a pixel level as features. This study has implemented well-known machine learning and deep learning classifiers such as K-Nearest Neighbours (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree, and Random Forest. A binary file of 25 malware families is converted into a fixed grayscale image. The grayscale images were then extracted transforming the size 100x100 into a single format into 100000 columns. During this phase, none of the columns are removed as to remain the patterns in each malware family. The experimental results show that our approach achieved 92% accuracy in Random Forest, 88% in SVM, 81% in Decision Tree, 80% in k-NN and 56% in Naïve Bayes classifier. Overall, the pixel-based feature also reveals a promising technique for identifying the family of malware with great accuracy, especially using the Random Forest classifier.
Singh, Shirish, Kaiser, Gail.
2021.
Metamorphic Detection of Repackaged Malware. 2021 IEEE/ACM 6th International Workshop on Metamorphic Testing (MET). :9–16.
Machine learning-based malware detection systems are often vulnerable to evasion attacks, in which a malware developer manipulates their malicious software such that it is misclassified as benign. Such software hides some properties of the real class or adopts some properties of a different class by applying small perturbations. A special case of evasive malware hides by repackaging a bonafide benign mobile app to contain malware in addition to the original functionality of the app, thus retaining most of the benign properties of the original app. We present a novel malware detection system based on metamorphic testing principles that can detect such benign-seeming malware apps. We apply metamorphic testing to the feature representation of the mobile app, rather than to the app itself. That is, the source input is the original feature vector for the app and the derived input is that vector with selected features removed. If the app was originally classified benign, and is indeed benign, the output for the source and derived inputs should be the same class, i.e., benign, but if they differ, then the app is exposed as (likely) malware. Malware apps originally classified as malware should retain that classification, since only features prevalent in benign apps are removed. This approach enables the machine learning model to classify repackaged malware with reasonably few false negatives and false positives. Our training pipeline is simpler than many existing ML-based malware detection methods, as the network is trained end-to-end to jointly learn appropriate features and to perform classification. We pre-trained our classifier model on 3 million apps collected from the widely-used AndroZoo dataset.1 We perform an extensive study on other publicly available datasets to show our approach's effectiveness in detecting repackaged malware with more than 94% accuracy, 0.98 precision, 0.95 recall, and 0.96 F1 score.
Khetarpal, Anavi, Mallik, Abhishek.
2021.
Visual Malware Classification Using Transfer Learning. 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT). :1–5.
The proliferation of malware attacks causes a hindrance to cybersecurity thus, posing a significant threat to our devices. The variety and number of both known as well as unknown malware makes it difficult to detect it. Research suggests that the ramifications of malware are only becoming worse with time and hence malware analysis becomes crucial. This paper proposes a visual malware classification technique to convert malware executables into their visual representations and obtain grayscale images of malicious files. These grayscale images are then used to classify malicious files into their respective malware families by passing them through deep convolutional neural networks (CNN). As part of deep CNN, we use various ImageNet models and compare their performance.
Wang, Shuwei, Wang, Qiuyun, Jiang, Zhengwei, Wang, Xuren, Jing, Rongqi.
2021.
A Weak Coupling of Semi-Supervised Learning with Generative Adversarial Networks for Malware Classification. 2020 25th International Conference on Pattern Recognition (ICPR). :3775–3782.
Malware classification helps to understand its purpose and is also an important part of attack detection. And it is also an important part of discovering attacks. Due to continuous innovation and development of artificial intelligence, it is a trend to combine deep learning with malware classification. In this paper, we propose an improved malware image rescaling algorithm (IMIR) based on local mean algorithm. Its main goal of IMIR is to reduce the loss of information from samples during the process of converting binary files to image files. Therefore, we construct a neural network structure based on VGG model, which is suitable for image classification. In the real world, a mass of malware family labels are inaccurate or lacking. To deal with this situation, we propose a novel method to train the deep neural network by Semi-supervised Generative Adversarial Network (SGAN), which only needs a small amount of malware that have accurate labels about families. By integrating SGAN with weak coupling, we can retain the weak links of supervised part and unsupervised part of SGAN. It improves the accuracy of malware classification by making classifiers more independent of discriminators. The results of experimental demonstrate that our model achieves exhibiting favorable performance. The recalls of each family in our data set are all higher than 93.75%.
Kumar, Shashank, Meena, Shivangi, Khosla, Savya, Parihar, Anil Singh.
2021.
AE-DCNN: Autoencoder Enhanced Deep Convolutional Neural Network For Malware Classification. 2021 International Conference on Intelligent Technologies (CONIT). :1–5.
Malware classification is a problem of great significance in the domain of information security. This is because the classification of malware into respective families helps in determining their intent, activity, and level of threat. In this paper, we propose a novel deep learning approach to malware classification. The proposed method converts malware executables into image-based representations. These images are then classified into different malware families using an autoencoder enhanced deep convolutional neural network (AE-DCNN). In particular, we propose a novel training mechanism wherein a DCNN classifier is trained with the help of an encoder. We conjecture that using an encoder in the proposed way provides the classifier with the extra information that is perhaps lost during the forward propagation, thereby leading to better results. The proposed approach eliminates the use of feature engineering, reverse engineering, disassembly, and other domain-specific techniques earlier used for malware classification. On the standard Malimg dataset, we achieve a 10-fold cross-validation accuracy of 99.38% and F1-score of 99.38%. Further, due to the texture-based analysis of malware files, the proposed technique is resilient to several obfuscation techniques.
Abdelmonem, Salma, Seddik, Shahd, El-Sayed, Rania, Kaseb, Ahmed S..
2021.
Enhancing Image-Based Malware Classification Using Semi-Supervised Learning. 2021 3rd Novel Intelligent and Leading Emerging Sciences Conference (NILES). :125–128.
Malicious software (malware) creators are constantly mutating malware files in order to avoid detection, resulting in hundreds of millions of new malware every year. Therefore, most malware files are unlabeled due to the time and cost needed to label them manually. This makes it very challenging to perform malware detection, i.e., deciding whether a file is malware or not, and malware classification, i.e., determining the family of the malware. Most solutions use supervised learning (e.g., ResNet and VGG) whose accuracy degrades significantly with the lack of abundance of labeled data. To solve this problem, this paper proposes a semi-supervised learning model for image-based malware classification. In this model, malware files are represented as grayscale images, and semi-supervised learning is carefully selected to handle the plethora of unlabeled data. Our proposed model is an enhanced version of the ∏-model, which makes it more accurate and consistent. Experiments show that our proposed model outperforms the original ∏-model by 4% in accuracy and three other supervised models by 6% in accuracy especially when the ratio of labeled samples is as low as 20%.
Lee, Shan-Hsin, Lan, Shen-Chieh, Huang, Hsiu-Chuan, Hsu, Chia-Wei, Chen, Yung-Shiu, Shieh, Shiuhpyng.
2021.
EC-Model: An Evolvable Malware Classification Model. 2021 IEEE Conference on Dependable and Secure Computing (DSC). :1–8.
Malware evolves quickly as new attack, evasion and mutation techniques are commonly used by hackers to build new malicious malware families. For malware detection and classification, multi-class learning model is one of the most popular machine learning models being used. To recognize malicious programs, multi-class model requires malware types to be predefined as output classes in advance which cannot be dynamically adjusted after the model is trained. When a new variant or type of malicious programs is discovered, the trained multi-class model will be no longer valid and have to be retrained completely. This consumes a significant amount of time and resources, and cannot adapt quickly to meet the timely requirement in dealing with dynamically evolving malware types. To cope with the problem, an evolvable malware classification deep learning model, namely EC-Model, is proposed in this paper which can dynamically adapt to new malware types without the need of fully retraining. Consequently, the reaction time can be significantly reduced to meet the timely requirement of malware classification. To our best knowledge, our work is the first attempt to adopt multi-task, deep learning for evolvable malware classification.
Gao, Tan, Li, Xudong, Chen, Wen.
2021.
Co-training For Image-Based Malware Classification. 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). :568–572.
A malware detection model based on semi-supervised learning is proposed in the paper. Our model includes mainly three parts: malware visualization, feature extraction, and classification. Firstly, the malware visualization converts malware into grayscale images; then the features of the images are extracted to reflect the coding patterns of malware; finally, a collaborative learning model is applied to malware detections using both labeled and unlabeled software samples. The proposed model was evaluated based on two commonly used benchmark datasets. The results demonstrated that compared with traditional methods, our model not only reduced the cost of sample labeling but also improved the detection accuracy through incorporating unlabeled samples into the collaborative learning process, thereby achieved higher classification performance.
Gülmez, Sibel, Sogukpinar, Ibrahim.
2021.
Graph-Based Malware Detection Using Opcode Sequences. 2021 9th International Symposium on Digital Forensics and Security (ISDFS). :1–5.
The impact of malware grows for IT (information technology) systems day by day. The number, the complexity, and the cost of them increase rapidly. While researchers are developing new and better detection algorithms, attackers are also evolving malware to fail the current detection techniques. Therefore malware detection becomes one of the most challenging tasks in cyber security. To increase the performance of the detection techniques, researchers benefit from different approaches. But some of them might cost a lot both in time and hardware resources. This situation puts forward fast and cheap detection methods. In this context, static analysis provides these utilities but it is important to keep detection accuracy high while reducing resource consumption. Opcodes (operational codes) are commonly used in static analysis but sometimes feature extraction from opcodes might be difficult since an opcode sequence might have a great length. Furthermore, most of the malware developers use obfuscation and encryption techniques to avoid detection methods based on static analysis. This kind of malware is called packed malware and according to common belief, packed malware should be either unpacked or analyzed dynamically in order to detect them. In this study, a graph-based malware detection method has been proposed to overcome these problems. The proposed method relies on obtaining the opcode graph of every executable file in the dataset and using them for future extraction. In this way, the proposed method reaches up to 98% detection accuracy. In addition to the accuracy rate, the proposed method makes it possible to detect packed malware without the need for unpacking or dynamic analysis.
Yedukondalu, G., Bindu, G. Hima, Pavan, J., Venkatesh, G., SaiTeja, A..
2021.
Intrusion Detection System Framework Using Machine Learning. 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA). :1224–1230.
Intrusion Detection System (IDS) is one of the most important security tool for many security issues that are prevailing in today's cyber world. Intrusion Detection System is designed to scan the system applications and network traffic to detect suspicious activities and issue an alert if it is discovered. So many techniques are available in machine learning for intrusion detection. The main objective of this project is to apply machine learning algorithms to the data set and to compare and evaluate their performances. The proposed application has used the SVM (Support Vector Machine) and ANN (Artificial Neural Networks) Algorithms to detect the intrusion rates. Each algorithm is used to detect whether the requested data is authorized or contains any anomalies. While IDS scans the requested data if it finds any malicious information it drops that request. These algorithms have used Correlation-Based and Chi-Squared Based feature selection algorithms to reduce the dataset by eliminating the useless data. The preprocessed dataset is trained and tested with the models to obtain the prominent results, which leads to increasing the prediction accuracy. The NSL KDD dataset has been used for the experimentation. Finally, an accuracy of about 48% has been achieved by the SVM algorithm and 97% has been achieved by ANN algorithm. Henceforth, ANN model is working better than the SVM on this dataset.