Lee, Yen-Ting, Ban, Tao, Wan, Tzu-Ling, Cheng, Shin-Ming, Isawa, Ryoichi, Takahashi, Takeshi, Inoue, Daisuke.
2020.
Cross Platform IoT-Malware Family Classification Based on Printable Strings. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :775–784.
In this era of rapid network development, Internet of Things (IoT) security considerations receive a lot of attention from both the research and commercial sectors. With limited computation resource, unfriendly interface, and poor software implementation, legacy IoT devices are vulnerable to many infamous mal ware attacks. Moreover, the heterogeneity of IoT platforms and the diversity of IoT malware make the detection and classification of IoT malware even more challenging. In this paper, we propose to use printable strings as an easy-to-get but effective cross-platform feature to identify IoT malware on different IoT platforms. The discriminating capability of these strings are verified using a set of machine learning algorithms on malware family classification across different platforms. The proposed scheme shows a 99% accuracy on a large scale IoT malware dataset consisted of 120K executable fils in executable and linkable format when the training and test are done on the same platform. Meanwhile, it also achieves a 96% accuracy when training is carried out on a few popular IoT platforms but test is done on different platforms. Efficient malware prevention and mitigation solutions can be enabled based on the proposed method to prevent and mitigate IoT malware damages across different platforms.
Khan, Mamoona, Baig, Duaa, Khan, Usman Shahid, Karim, Ahmad.
2020.
Malware Classification Framework Using Convolutional Neural Network. 2020 International Conference on Cyber Warfare and Security (ICCWS). :1–7.
Cyber-security is facing a huge threat from malware and malware mass production due to its mutation factors. Classification of malware by their features is necessary for the security of information technology (IT) society. To provide security from malware, deep neural networks (DNN) can offer a superior solution for the detection and categorization of malware samples by using image classification techniques. To strengthen our ideology of malware classification through image recognition, we have experimented by comparing two perspectives of malware classification. The first perspective implements dense neural networks on binary files and the other applies deep layered convolutional neural network on malware images. The proposed model is trained to a set of malware samples, which are further distributed into 9 different families. The dataset of malware samples which is used in this paper is provided by Microsoft for Microsoft Malware Classification Challenge in 2015. The proposed model shows an accuracy of 97.80% on the provided dataset. By using the proposed model optimum classifications results can be attained.
Choudhary, Sunita, Sharma, Anand.
2020.
Malware Detection Amp; Classification Using Machine Learning. 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3). :1–4.
With fast turn of events and development of the web, malware is one of major digital dangers nowadays. Henceforth, malware detection is an important factor in the security of computer systems. Nowadays, attackers generally design polymeric malware [1], it is usually a type of malware [2] that continuously changes its recognizable feature to fool detection techniques that uses typical signature based methods [3]. That is why the need for Machine Learning based detection arises. In this work, we are going to obtain behavioral-pattern that may be achieved through static or dynamic analysis, afterward we can apply dissimilar ML techniques to identify whether it's malware or not. Behavioral based Detection methods [4] will be discussed to take advantage from ML algorithms so as to frame social-based malware recognition and classification model.
Sartoli, Sara, Wei, Yong, Hampton, Shane.
2020.
Malware Classification Using Recurrence Plots and Deep Neural Network. 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). :901–906.
In this paper, we introduce a method for visualizing and classifying malware binaries. A malware binary consists of a series of data points of compiled machine codes that represent programming components. The occurrence and recurrence behavior of these components is determined by the common tasks malware samples in a particular family carry out. Thus, we view a malware binary as a series of emissions generated by an underlying stochastic process and use recurrence plots to transform malware binaries into two-dimensional texture images. We observe that recurrence plot-based malware images have significant visual similarities within the same family and are different from samples in other families. We apply deep CNN classifiers to classify malware samples. The proposed approach does not require creating malware signature or manual feature engineering. Our preliminary experimental results show that the proposed malware representation leads to a higher and more stable accuracy in comparison to directly transforming malware binaries to gray-scale images.
Chen, Chin-Wei, Su, Ching-Hung, Lee, Kun-Wei, Bair, Ping-Hao.
2020.
Malware Family Classification Using Active Learning by Learning. 2020 22nd International Conference on Advanced Communication Technology (ICACT). :590–595.
In the past few years, the malware industry has been thriving. Malware variants among the same malware family shared similar behavioural patterns or signatures reflecting their purpose. We propose an approach that combines support vector machine (SVM) classifiers and active learning by learning (ALBL) techniques to deal with insufficient labeled data in terms of the malware classification tasks. The proposed approach is evaluated with the malware family dataset from Microsoft Malware Classification Challenge (BIG 2015) on Kaggle. The results show that ALBL techniques can effectively boost the performance of our machine learning models and improve the quality of labeled samples.
Lin, Kuang-Yao, Huang, Wei-Ren.
2020.
Using Federated Learning on Malware Classification. 2020 22nd International Conference on Advanced Communication Technology (ICACT). :585–589.
In recent years, everything has been more and more systematic, and it would generate many cyber security issues. One of the most important of these is the malware. Modern malware has switched to a high-growth phase. According to the AV-TEST Institute showed that there are over 350,000 new malicious programs (malware) and potentially unwanted applications (PUA) be registered every day. This threat was presented and discussed in the present paper. In addition, we also considered data privacy by using federated learning. Feature extraction can be performed based on malware. The proposed method achieves very high accuracy ($\approx$0.9167) on the dataset provided by VirusTotal.
Snow, Elijah, Alam, Mahbubul, Glandon, Alexander, Iftekharuddin, Khan.
2020.
End-to-End Multimodel Deep Learning for Malware Classification. 2020 International Joint Conference on Neural Networks (IJCNN). :1–7.
Malicious software (malware) is designed to cause unwanted or destructive effects on computers. Since modern society is dependent on computers to function, malware has the potential to do untold damage. Therefore, developing techniques to effectively combat malware is critical. With the rise in popularity of polymorphic malware, conventional anti-malware techniques fail to keep up with the rate of emergence of new malware. This poses a major challenge towards developing an efficient and robust malware detection technique. One approach to overcoming this challenge is to classify new malware among families of known malware. Several machine learning methods have been proposed for solving the malware classification problem. However, these techniques rely on hand-engineered features extracted from malware data which may not be effective for classifying new malware. Deep learning models have shown paramount success for solving various classification tasks such as image and text classification. Recent deep learning techniques are capable of extracting features directly from the input data. Consequently, this paper proposes an end-to-end deep learning framework for multimodels (henceforth, multimodel learning) to solve the challenging malware classification problem. The proposed model utilizes three different deep neural network architectures to jointly learn meaningful features from different attributes of the malware data. End-to-end learning optimizes all processing steps simultaneously, which improves model accuracy and generalizability. The performance of the model is tested with the widely used and publicly available Microsoft Malware Challenge Dataset and is compared with the state-of-the-art deep learning-based malware classification pipeline. Our results suggest that the proposed model achieves comparable performance to the state-of-the-art methods while offering faster training using end-to-end multimodel learning.
Brezinski, Kenneth, Ferens, Ken.
2020.
Complexity-Based Convolutional Neural Network for Malware Classification. 2020 International Conference on Computational Science and Computational Intelligence (CSCI). :1–9.
Malware classification remains at the forefront of ongoing research as the prevalence of metamorphic malware introduces new challenges to anti-virus vendors and firms alike. One approach to malware classification is Static Analysis - a form of analysis which does not require malware to be executed before classification can be performed. For this reason, a lightweight classifier based on the features of a malware binary is preferred, with relatively low computational overhead. In this work a modified convolutional neural network (CNN) architecture was deployed which integrated a complexity-based evaluation based on box-counting. This was implemented by setting up max-pooling layers in parallel, and then extracting the fractal dimension using a polyscalar relationship based on the resolution of the measurement scale and the number of elements of a malware image covered in the measurement under consideration. To test the robustness and efficacy of our approach we trained and tested on over 9300 malware binaries from 25 unique malware families. This work was compared to other award-winning image recognition models, and results showed categorical accuracy in excess of 96.54%.
Ghanem, Sahar M., Aldeen, Donia Naief Saad.
2020.
AltCC: Alternating Clustering and Classification for Batch Analysis of Malware Behavior. 2020 International Symposium on Networks, Computers and Communications (ISNCC). :1–6.
The most common goal of malware analysis is to determine if a given binary is malware or benign. Another objective is similarity analysis of malware binaries to understand how new samples differ from known ones. Similarity analysis helps to analyze the malware with respect to those already analyzed and guides the discovery of novel aspects that should be analyzed more in depth. In this work, we are concerned with similarities and differences detection of malware binaries. Thousands of malware are created every day and machine learning is an indispensable tool for its analysis. Previous work has studied clustering and classification as competing paradigms. However, in this work, a malware similarity analysis technique (AltCC) is proposed that alternates the use of clustering and classification. In addition it assumes the malware are not available all at once but processed in batches. Initially, clustering is applied to the first batch to group similar binaries into novel malware classes. Then, the discovered classes are used to train a classifier. For the following batches, the classifier is used to decide if a new binary classifies to a known class or otherwise unclassified. The unclassified binaries are clustered and the process repeats. Malware clustering (i.e. labeling) may entail further human expert analysis but dramatically reduces the effort. The effectiveness of AltCC is studied using a dataset of 29,661 malware binaries that represent malware received in six consecutive days/batches. When KMeans is used to label the dataset all at once and its labeling is compared to AltCC's, the adjusted-rand-index scores 0.71.