Biblio

List
Filter

Found 72 results

Filters: Keyword is malware classification [Clear All Filters]

2023-09-18

Jia, Jingyun, Chan, Philip K.. 2022. Representation Learning with Function Call Graph Transformations for Malware Open Set Recognition. 2022 International Joint Conference on Neural Networks (IJCNN). :1—8.

Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.

2022-12-23

Duby, Adam, Taylor, Teryl, Bloom, Gedare, Zhuang, Yanyan. 2022. Detecting and Classifying Self-Deleting Windows Malware Using Prefetch Files. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). :0745–0751.

Malware detection and analysis can be a burdensome task for incident responders. As such, research has turned to machine learning to automate malware detection and malware family classification. Existing work extracts and engineers static and dynamic features from the malware sample to train classifiers. Despite promising results, such techniques assume that the analyst has access to the malware executable file. Self-deleting malware invalidates this assumption and requires analysts to find forensic evidence of malware execution for further analysis. In this paper, we present and evaluate an approach to detecting malware that executed on a Windows target and further classify the malware into its associated family to provide semantic insight. Specifically, we engineer features from the Windows prefetch file, a file system forensic artifact that archives process information. Results show that it is possible to detect the malicious artifact with 99% accuracy; furthermore, classifying the malware into a fine-grained family has comparable performance to techniques that require access to the original executable. We also provide a thorough security discussion of the proposed approach against adversarial diversity.

2022-03-23

Maheswari, K. Uma, Shobana, G., Bushra, S. Nikkath, Subramanian, Nalini. 2021. Supervised malware learning in cloud through System calls analysis. 2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES). :1–8.

Even if there is a rapid proliferation with the advantages of low cost, the emerging on-demand cloud services have led to an increase in cybercrime activities. Cyber criminals are utilizing cloud services through its distributed nature of infrastructure and create a lot of challenges to detect and investigate the incidents by the security personnel. The tracing of command flow forms a clue for the detection of malicious activity occurring in the system through System Calls Analysis (SCA). As machine learning based approaches are known to automate the work in detecting malwares, simple Support Vector Machine (SVM) based approaches are often reporting low value of accuracy. In this work, a malware classification system proposed with the supervised machine learning of unknown malware instances through Support Vector Machine - Stochastic Gradient Descent (SVM-SGD) algorithm. The performance of the system evaluated on CIC-IDS2017 dataset with labelled attacks. The system is compared with traditional signature based detection model and observed to report less number of false alerts with improved accuracy. The signature based detection gets an accuracy of 86.12%, while the SVM-SGD gets the best accuracy of 99.13%. The model is found to be lightweight but efficient in detecting malware with high degree of accuracy.

2022-02-07

Ben Abdel Ouahab, Ikram, Elaachak, Lotfi, Alluhaidan, Yasser A., Bouhorma, Mohammed. 2021. A new approach to detect next generation of malware based on machine learning. 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). :230–235.

In these days, malware attacks target different kinds of devices as IoT, mobiles, servers even the cloud. It causes several hardware damages and financial losses especially for big companies. Malware attacks represent a serious issue to cybersecurity specialists. In this paper, we propose a new approach to detect unknown malware families based on machine learning classification and visualization technique. A malware binary is converted to grayscale image, then for each image a GIST descriptor is used as input to the machine learning model. For the malware classification part we use 3 machine learning algorithms. These classifiers are so efficient where the highest precision reach 98%. Once we train, test and evaluate models we move to simulate 2 new malware families. We do not expect a good prediction since the model did not know the family; however our goal is to analyze the behavior of our classifiers in the case of new family. Finally, we propose an approach using a filter to know either the classification is normal or it's a zero-day malware.

Osman, Mohd Zamri, Abidin, Ahmad Firdaus Zainal, Romli, Rahiwan Nazar, Darmawan, Mohd Faaizie. 2021. Pixel-based Feature for Android Malware Family Classification using Machine Learning Algorithms. 2021 International Conference on Software Engineering Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM). :552–555.

‘Malicious software’ or malware has been a serious threat to the security and privacy of all mobile phone users. Due to the popularity of smartphones, primarily Android, this makes them a very viable target for spreading malware. In the past, many solutions have proved ineffective and have resulted in many false positives. Having the ability to identify and classify malware will help prevent them from spreading and evolving. In this paper, we study the effectiveness of the proposed classification of the malware family using a pixel level as features. This study has implemented well-known machine learning and deep learning classifiers such as K-Nearest Neighbours (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree, and Random Forest. A binary file of 25 malware families is converted into a fixed grayscale image. The grayscale images were then extracted transforming the size 100x100 into a single format into 100000 columns. During this phase, none of the columns are removed as to remain the patterns in each malware family. The experimental results show that our approach achieved 92% accuracy in Random Forest, 88% in SVM, 81% in Decision Tree, 80% in k-NN and 56% in Naïve Bayes classifier. Overall, the pixel-based feature also reveals a promising technique for identifying the family of malware with great accuracy, especially using the Random Forest classifier.

Singh, Shirish, Kaiser, Gail. 2021. Metamorphic Detection of Repackaged Malware. 2021 IEEE/ACM 6th International Workshop on Metamorphic Testing (MET). :9–16.

Machine learning-based malware detection systems are often vulnerable to evasion attacks, in which a malware developer manipulates their malicious software such that it is misclassified as benign. Such software hides some properties of the real class or adopts some properties of a different class by applying small perturbations. A special case of evasive malware hides by repackaging a bonafide benign mobile app to contain malware in addition to the original functionality of the app, thus retaining most of the benign properties of the original app. We present a novel malware detection system based on metamorphic testing principles that can detect such benign-seeming malware apps. We apply metamorphic testing to the feature representation of the mobile app, rather than to the app itself. That is, the source input is the original feature vector for the app and the derived input is that vector with selected features removed. If the app was originally classified benign, and is indeed benign, the output for the source and derived inputs should be the same class, i.e., benign, but if they differ, then the app is exposed as (likely) malware. Malware apps originally classified as malware should retain that classification, since only features prevalent in benign apps are removed. This approach enables the machine learning model to classify repackaged malware with reasonably few false negatives and false positives. Our training pipeline is simpler than many existing ML-based malware detection methods, as the network is trained end-to-end to jointly learn appropriate features and to perform classification. We pre-trained our classifier model on 3 million apps collected from the widely-used AndroZoo dataset.1 We perform an extensive study on other publicly available datasets to show our approach's effectiveness in detecting repackaged malware with more than 94% accuracy, 0.98 precision, 0.95 recall, and 0.96 F1 score.

Khetarpal, Anavi, Mallik, Abhishek. 2021. Visual Malware Classification Using Transfer Learning. 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT). :1–5.

The proliferation of malware attacks causes a hindrance to cybersecurity thus, posing a significant threat to our devices. The variety and number of both known as well as unknown malware makes it difficult to detect it. Research suggests that the ramifications of malware are only becoming worse with time and hence malware analysis becomes crucial. This paper proposes a visual malware classification technique to convert malware executables into their visual representations and obtain grayscale images of malicious files. These grayscale images are then used to classify malicious files into their respective malware families by passing them through deep convolutional neural networks (CNN). As part of deep CNN, we use various ImageNet models and compare their performance.

Wang, Shuwei, Wang, Qiuyun, Jiang, Zhengwei, Wang, Xuren, Jing, Rongqi. 2021. A Weak Coupling of Semi-Supervised Learning with Generative Adversarial Networks for Malware Classification. 2020 25th International Conference on Pattern Recognition (ICPR). :3775–3782.

Malware classification helps to understand its purpose and is also an important part of attack detection. And it is also an important part of discovering attacks. Due to continuous innovation and development of artificial intelligence, it is a trend to combine deep learning with malware classification. In this paper, we propose an improved malware image rescaling algorithm (IMIR) based on local mean algorithm. Its main goal of IMIR is to reduce the loss of information from samples during the process of converting binary files to image files. Therefore, we construct a neural network structure based on VGG model, which is suitable for image classification. In the real world, a mass of malware family labels are inaccurate or lacking. To deal with this situation, we propose a novel method to train the deep neural network by Semi-supervised Generative Adversarial Network (SGAN), which only needs a small amount of malware that have accurate labels about families. By integrating SGAN with weak coupling, we can retain the weak links of supervised part and unsupervised part of SGAN. It improves the accuracy of malware classification by making classifiers more independent of discriminators. The results of experimental demonstrate that our model achieves exhibiting favorable performance. The recalls of each family in our data set are all higher than 93.75%.

Kumar, Shashank, Meena, Shivangi, Khosla, Savya, Parihar, Anil Singh. 2021. AE-DCNN: Autoencoder Enhanced Deep Convolutional Neural Network For Malware Classification. 2021 International Conference on Intelligent Technologies (CONIT). :1–5.

Malware classification is a problem of great significance in the domain of information security. This is because the classification of malware into respective families helps in determining their intent, activity, and level of threat. In this paper, we propose a novel deep learning approach to malware classification. The proposed method converts malware executables into image-based representations. These images are then classified into different malware families using an autoencoder enhanced deep convolutional neural network (AE-DCNN). In particular, we propose a novel training mechanism wherein a DCNN classifier is trained with the help of an encoder. We conjecture that using an encoder in the proposed way provides the classifier with the extra information that is perhaps lost during the forward propagation, thereby leading to better results. The proposed approach eliminates the use of feature engineering, reverse engineering, disassembly, and other domain-specific techniques earlier used for malware classification. On the standard Malimg dataset, we achieve a 10-fold cross-validation accuracy of 99.38% and F1-score of 99.38%. Further, due to the texture-based analysis of malware files, the proposed technique is resilient to several obfuscation techniques.

Abdelmonem, Salma, Seddik, Shahd, El-Sayed, Rania, Kaseb, Ahmed S.. 2021. Enhancing Image-Based Malware Classification Using Semi-Supervised Learning. 2021 3rd Novel Intelligent and Leading Emerging Sciences Conference (NILES). :125–128.

Malicious software (malware) creators are constantly mutating malware files in order to avoid detection, resulting in hundreds of millions of new malware every year. Therefore, most malware files are unlabeled due to the time and cost needed to label them manually. This makes it very challenging to perform malware detection, i.e., deciding whether a file is malware or not, and malware classification, i.e., determining the family of the malware. Most solutions use supervised learning (e.g., ResNet and VGG) whose accuracy degrades significantly with the lack of abundance of labeled data. To solve this problem, this paper proposes a semi-supervised learning model for image-based malware classification. In this model, malware files are represented as grayscale images, and semi-supervised learning is carefully selected to handle the plethora of unlabeled data. Our proposed model is an enhanced version of the ∏-model, which makes it more accurate and consistent. Experiments show that our proposed model outperforms the original ∏-model by 4% in accuracy and three other supervised models by 6% in accuracy especially when the ratio of labeled samples is as low as 20%.

Lee, Shan-Hsin, Lan, Shen-Chieh, Huang, Hsiu-Chuan, Hsu, Chia-Wei, Chen, Yung-Shiu, Shieh, Shiuhpyng. 2021. EC-Model: An Evolvable Malware Classification Model. 2021 IEEE Conference on Dependable and Secure Computing (DSC). :1–8.

Malware evolves quickly as new attack, evasion and mutation techniques are commonly used by hackers to build new malicious malware families. For malware detection and classification, multi-class learning model is one of the most popular machine learning models being used. To recognize malicious programs, multi-class model requires malware types to be predefined as output classes in advance which cannot be dynamically adjusted after the model is trained. When a new variant or type of malicious programs is discovered, the trained multi-class model will be no longer valid and have to be retrained completely. This consumes a significant amount of time and resources, and cannot adapt quickly to meet the timely requirement in dealing with dynamically evolving malware types. To cope with the problem, an evolvable malware classification deep learning model, namely EC-Model, is proposed in this paper which can dynamically adapt to new malware types without the need of fully retraining. Consequently, the reaction time can be significantly reduced to meet the timely requirement of malware classification. To our best knowledge, our work is the first attempt to adopt multi-task, deep learning for evolvable malware classification.

Gao, Tan, Li, Xudong, Chen, Wen. 2021. Co-training For Image-Based Malware Classification. 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). :568–572.

A malware detection model based on semi-supervised learning is proposed in the paper. Our model includes mainly three parts: malware visualization, feature extraction, and classification. Firstly, the malware visualization converts malware into grayscale images; then the features of the images are extracted to reflect the coding patterns of malware; finally, a collaborative learning model is applied to malware detections using both labeled and unlabeled software samples. The proposed model was evaluated based on two commonly used benchmark datasets. The results demonstrated that compared with traditional methods, our model not only reduced the cost of sample labeling but also improved the detection accuracy through incorporating unlabeled samples into the collaborative learning process, thereby achieved higher classification performance.

Keyes, David Sean, Li, Beiqi, Kaur, Gurdip, Lashkari, Arash Habibi, Gagnon, Francois, Massicotte, Frédéric. 2021. EntropLyzer: Android Malware Classification and Characterization Using Entropy Analysis of Dynamic Characteristics. 2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS). :1–12.

The unmatched threat of Android malware has tremendously increased the need for analyzing prominent malware samples. There are remarkable efforts in static and dynamic malware analysis using static features and API calls respectively. Nonetheless, there is a void to classify Android malware by analyzing its behavior using multiple dynamic characteristics. This paper proposes EntropLyzer, an entropy-based behavioral analysis technique for classifying the behavior of 12 eminent Android malware categories and 147 malware families taken from CCCS-CIC-AndMal2020 dataset. This work uses six classes of dynamic characteristics including memory, API, network, logcat, battery, and process to classify and characterize Android malware. Results reveal that the entropy-based analysis successfully determines the behavior of all malware categories and most of the malware families before and after rebooting the emulator.

Or-Meir, Ori, Cohen, Aviad, Elovici, Yuval, Rokach, Lior, Nissim, Nir. 2021. Pay Attention: Improving Classification of PE Malware Using Attention Mechanisms Based on System Call Analysis. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.

Malware poses a threat to computing systems worldwide, and security experts work tirelessly to detect and classify malware as accurately and quickly as possible. Since malware can use evasion techniques to bypass static analysis and security mechanisms, dynamic analysis methods are more useful for accurately analyzing the behavioral patterns of malware. Previous studies showed that malware behavior can be represented by sequences of executed system calls and that machine learning algorithms can leverage such sequences for the task of malware classification (a.k.a. malware categorization). Accurate malware classification is helpful for malware signature generation and is thus beneficial to antivirus vendors; this capability is also valuable to organizational security experts, enabling them to mitigate malware attacks and respond to security incidents. In this paper, we propose an improved methodology for malware classification, based on analyzing sequences of system calls invoked by malware in a dynamic analysis environment. We show that adding an attention mechanism to a LSTM model improves accuracy for the task of malware classification, thus outperforming the state-of-the-art algorithm by up to 6%. We also show that the transformer architecture can be used to analyze very long sequences with significantly lower time complexity for training and prediction. Our proposed method can serve as the basis for a decision support system for security experts, for the task of malware categorization.

2021-09-21

Sartoli, Sara, Wei, Yong, Hampton, Shane. 2020. Malware Classification Using Recurrence Plots and Deep Neural Network. 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). :901–906.

In this paper, we introduce a method for visualizing and classifying malware binaries. A malware binary consists of a series of data points of compiled machine codes that represent programming components. The occurrence and recurrence behavior of these components is determined by the common tasks malware samples in a particular family carry out. Thus, we view a malware binary as a series of emissions generated by an underlying stochastic process and use recurrence plots to transform malware binaries into two-dimensional texture images. We observe that recurrence plot-based malware images have significant visual similarities within the same family and are different from samples in other families. We apply deep CNN classifiers to classify malware samples. The proposed approach does not require creating malware signature or manual feature engineering. Our preliminary experimental results show that the proposed malware representation leads to a higher and more stable accuracy in comparison to directly transforming malware binaries to gray-scale images.

Chen, Chin-Wei, Su, Ching-Hung, Lee, Kun-Wei, Bair, Ping-Hao. 2020. Malware Family Classification Using Active Learning by Learning. 2020 22nd International Conference on Advanced Communication Technology (ICACT). :590–595.

In the past few years, the malware industry has been thriving. Malware variants among the same malware family shared similar behavioural patterns or signatures reflecting their purpose. We propose an approach that combines support vector machine (SVM) classifiers and active learning by learning (ALBL) techniques to deal with insufficient labeled data in terms of the malware classification tasks. The proposed approach is evaluated with the malware family dataset from Microsoft Malware Classification Challenge (BIG 2015) on Kaggle. The results show that ALBL techniques can effectively boost the performance of our machine learning models and improve the quality of labeled samples.

Snow, Elijah, Alam, Mahbubul, Glandon, Alexander, Iftekharuddin, Khan. 2020. End-to-End Multimodel Deep Learning for Malware Classification. 2020 International Joint Conference on Neural Networks (IJCNN). :1–7.

Malicious software (malware) is designed to cause unwanted or destructive effects on computers. Since modern society is dependent on computers to function, malware has the potential to do untold damage. Therefore, developing techniques to effectively combat malware is critical. With the rise in popularity of polymorphic malware, conventional anti-malware techniques fail to keep up with the rate of emergence of new malware. This poses a major challenge towards developing an efficient and robust malware detection technique. One approach to overcoming this challenge is to classify new malware among families of known malware. Several machine learning methods have been proposed for solving the malware classification problem. However, these techniques rely on hand-engineered features extracted from malware data which may not be effective for classifying new malware. Deep learning models have shown paramount success for solving various classification tasks such as image and text classification. Recent deep learning techniques are capable of extracting features directly from the input data. Consequently, this paper proposes an end-to-end deep learning framework for multimodels (henceforth, multimodel learning) to solve the challenging malware classification problem. The proposed model utilizes three different deep neural network architectures to jointly learn meaningful features from different attributes of the malware data. End-to-end learning optimizes all processing steps simultaneously, which improves model accuracy and generalizability. The performance of the model is tested with the widely used and publicly available Microsoft Malware Challenge Dataset and is compared with the state-of-the-art deep learning-based malware classification pipeline. Our results suggest that the proposed model achieves comparable performance to the state-of-the-art methods while offering faster training using end-to-end multimodel learning.

Brezinski, Kenneth, Ferens, Ken. 2020. Complexity-Based Convolutional Neural Network for Malware Classification. 2020 International Conference on Computational Science and Computational Intelligence (CSCI). :1–9.

Malware classification remains at the forefront of ongoing research as the prevalence of metamorphic malware introduces new challenges to anti-virus vendors and firms alike. One approach to malware classification is Static Analysis - a form of analysis which does not require malware to be executed before classification can be performed. For this reason, a lightweight classifier based on the features of a malware binary is preferred, with relatively low computational overhead. In this work a modified convolutional neural network (CNN) architecture was deployed which integrated a complexity-based evaluation based on box-counting. This was implemented by setting up max-pooling layers in parallel, and then extracting the fractal dimension using a polyscalar relationship based on the resolution of the measurement scale and the number of elements of a malware image covered in the measurement under consideration. To test the robustness and efficacy of our approach we trained and tested on over 9300 malware binaries from 25 unique malware families. This work was compared to other award-winning image recognition models, and results showed categorical accuracy in excess of 96.54%.

2021-06-24

ManiArasuSekar, KannanMani S., Swaminathan, Paveethran, Murali, Ritwik, Ratan, Govind K., Siva, Surya V.. 2020. Optimal Feature Selection for Non-Network Malware Classification. 2020 International Conference on Inventive Computation Technologies (ICICT). :82—87.

In this digital age, almost every system and service has moved from a localized to a digital environment. Consequently the number of attacks targeting both personal as well as commercial digital devices has also increased exponentially. In most cases specific malware attacks have caused widespread damage and emotional anguish. Though there are automated techniques to analyse and thwart such attacks, they are still far from perfect. This paper identifies optimal features, which improves the accuracy and efficiency of the classification process, required for malware classification in an attempt to assist automated anti-malware systems identify and block malware families in an attempt to secure the end user and reduce the damage caused by these malicious software.

2020-12-11

Slawinski, M., Wortman, A.. 2019. Applications of Graph Integration to Function Comparison and Malware Classification. 2019 4th International Conference on System Reliability and Safety (ICSRS). :16—24.

We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.

2020-10-29

Vi, Bao Ngoc, Noi Nguyen, Huu, Nguyen, Ngoc Tran, Truong Tran, Cao. 2019. Adversarial Examples Against Image-based Malware Classification Systems. 2019 11th International Conference on Knowledge and Systems Engineering (KSE). :1—5.

Malicious software, known as malware, has become urgently serious threat for computer security, so automatic mal-ware classification techniques have received increasing attention. In recent years, deep learning (DL) techniques for computer vision have been successfully applied for malware classification by visualizing malware files and then using DL to classify visualized images. Although DL-based classification systems have been proven to be much more accurate than conventional ones, these systems have been shown to be vulnerable to adversarial attacks. However, there has been little research to consider the danger of adversarial attacks to visualized image-based malware classification systems. This paper proposes an adversarial attack method based on the gradient to attack image-based malware classification systems by introducing perturbations on resource section of PE files. The experimental results on the Malimg dataset show that by a small interference, the proposed method can achieve success attack rate when challenging convolutional neural network malware classifiers.

Xylogiannopoulos, Konstantinos F., Karampelas, Panagiotis, Alhajj, Reda. 2019. Text Mining for Malware Classification Using Multivariate All Repeated Patterns Detection. 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :887—894.

Mobile phones have become nowadays a commodity to the majority of people. Using them, people are able to access the world of Internet and connect with their friends, their colleagues at work or even unknown people with common interests. This proliferation of the mobile devices has also been seen as an opportunity for the cyber criminals to deceive smartphone users and steel their money directly or indirectly, respectively, by accessing their bank accounts through the smartphones or by blackmailing them or selling their private data such as photos, credit card data, etc. to third parties. This is usually achieved by installing malware to smartphones masking their malevolent payload as a legitimate application and advertise it to the users with the hope that mobile users will install it in their devices. Thus, any existing application can easily be modified by integrating a malware and then presented it as a legitimate one. In response to this, scientists have proposed a number of malware detection and classification methods using a variety of techniques. Even though, several of them achieve relatively high precision in malware classification, there is still space for improvement. In this paper, we propose a text mining all repeated pattern detection method which uses the decompiled files of an application in order to classify a suspicious application into one of the known malware families. Based on the experimental results using a real malware dataset, the methodology tries to correctly classify (without any misclassification) all randomly selected malware applications of 3 categories with 3 different families each.

Choi, Seok-Hwan, Shin, Jin-Myeong, Liu, Peng, Choi, Yoon-Ho. 2019. Robustness Analysis of CNN-based Malware Family Classification Methods Against Various Adversarial Attacks. 2019 IEEE Conference on Communications and Network Security (CNS). :1—6.

As malware family classification methods, image-based classification methods have attracted much attention. Especially, due to the fast classification speed and the high classification accuracy, Convolutional Neural Network (CNN)-based malware family classification methods have been studied. However, previous studies on CNN-based classification methods focused only on improving the classification accuracy of malware families. That is, previous studies did not consider the cases that the accuracy of CNN-based malware classification methods can be decreased under the existence of adversarial attacks. In this paper, we analyze the robustness of various CNN-based malware family classification models under adversarial attacks. While adding imperceptible non-random perturbations to the input image, we measured how the accuracy of the CNN-based malware family classification model can be affected. Also, we showed the influence of three significant visualization parameters(i.e., the size of input image, dimension of input image, and conversion color of a special character)on the accuracy variation under adversarial attacks. From the evaluation results using the Microsoft malware dataset, we showed that even the accuracy over 98% of the CNN-based malware family classification method can be decreased to less than 7%.

Roseline, S. Abijah, Sasisri, A. D., Geetha, S., Balasubramanian, C.. 2019. Towards Efficient Malware Detection and Classification using Multilayered Random Forest Ensemble Technique. 2019 International Carnahan Conference on Security Technology (ICCST). :1—6.

The exponential growth rate of malware causes significant security concern in this digital era to computer users, private and government organizations. Traditional malware detection methods employ static and dynamic analysis, which are ineffective in identifying unknown malware. Malware authors develop new malware by using polymorphic and evasion techniques on existing malware and escape detection. Newly arriving malware are variants of existing malware and their patterns can be analyzed using the vision-based method. Malware patterns are visualized as images and their features are characterized. The alternative generation of class vectors and feature vectors using ensemble forests in multiple sequential layers is performed for classifying malware. This paper proposes a hybrid stacked multilayered ensembling approach which is robust and efficient than deep learning models. The proposed model outperforms the machine learning and deep learning models with an accuracy of 98.91%. The proposed system works well for small-scale and large-scale data since its adaptive nature of setting parameters (number of sequential levels) automatically. It is computationally efficient in terms of resources and time. The method uses very fewer hyper-parameters compared to deep neural networks.

Priyamvada Davuluru, Venkata Salini, Narayanan Narayanan, Barath, Balster, Eric J.. 2019. Convolutional Neural Networks as Classification Tools and Feature Extractors for Distinguishing Malware Programs. 2019 IEEE National Aerospace and Electronics Conference (NAECON). :273—278.

Classifying malware programs is a research area attracting great interest for Anti-Malware industry. In this research, we propose a system that visualizes malware programs as images and distinguishes those using Convolutional Neural Networks (CNNs). We study the performance of several well-established CNN based algorithms such as AlexNet, ResNet and VGG16 using transfer learning approaches. We also propose a computationally efficient CNN-based architecture for classification of malware programs. In addition, we study the performance of these CNNs as feature extractors by using Support Vector Machine (SVM) and K-nearest Neighbors (kNN) for classification purposes. We also propose fusion methods to boost the performance further. We make use of the publicly available database provided by Microsoft Malware Classification Challenge (BIG 2015) for this study. Our overall performance is 99.4% for a set of 2174 test samples comprising 9 different classes thereby setting a new benchmark.