Visible to the public Biblio

Found 522 results

Filters: Keyword is Malware  [Clear All Filters]
2020-12-11
Ge, X., Pan, Y., Fan, Y., Fang, C..  2019.  AMDroid: Android Malware Detection Using Function Call Graphs. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C). :71—77.

With the rapid development of the mobile Internet, Android has been the most popular mobile operating system. Due to the open nature of Android, c countless malicious applications are hidden in a large number of benign applications, which pose great threats to users. Most previous malware detection approaches mainly rely on features such as permissions, API calls, and opcode sequences. However, these approaches fail to capture structural semantics of applications. In this paper, we propose AMDroid that leverages function call graphs (FCGs) representing the behaviors of applications and applies graph kernels to automatically learn the structural semantics of applications from FCGs. We evaluate AMDroid on the Genome Project, and the experimental results show that AMDroid is effective to detect Android malware with 97.49% detection accuracy.

Abusnaina, A., Khormali, A., Alasmary, H., Park, J., Anwar, A., Mohaisen, A..  2019.  Adversarial Learning Attacks on Graph-based IoT Malware Detection Systems. 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). :1296—1305.

IoT malware detection using control flow graph (CFG)-based features and deep learning networks are widely explored. The main goal of this study is to investigate the robustness of such models against adversarial learning. We designed two approaches to craft adversarial IoT software: off-the-shelf methods and Graph Embedding and Augmentation (GEA) method. In the off-the-shelf adversarial learning attack methods, we examine eight different adversarial learning methods to force the model to misclassification. The GEA approach aims to preserve the functionality and practicality of the generated adversarial sample through a careful embedding of a benign sample to a malicious one. Intensive experiments are conducted to evaluate the performance of the proposed method, showing that off-the-shelf adversarial attack methods are able to achieve a misclassification rate of 100%. In addition, we observed that the GEA approach is able to misclassify all IoT malware samples as benign. The findings of this work highlight the essential need for more robust detection tools against adversarial learning, including features that are not easy to manipulate, unlike CFG-based features. The implications of the study are quite broad, since the approach challenged in this work is widely used for other applications using graphs.

Fan, M., Luo, X., Liu, J., Wang, M., Nong, C., Zheng, Q., Liu, T..  2019.  Graph Embedding Based Familial Analysis of Android Malware using Unsupervised Learning. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). :771—782.

The rapid growth of Android malware has posed severe security threats to smartphone users. On the basis of the familial trait of Android malware observed by previous work, the familial analysis is a promising way to help analysts better focus on the commonalities of malware samples within the same families, thus reducing the analytical workload and accelerating malware analysis. The majority of existing approaches rely on supervised learning and face three main challenges, i.e., low accuracy, low efficiency, and the lack of labeled dataset. To address these challenges, we first construct a fine-grained behavior model by abstracting the program semantics into a set of subgraphs. Then, we propose SRA, a novel feature that depicts the similarity relationships between the Structural Roles of sensitive API call nodes in subgraphs. An SRA is obtained based on graph embedding techniques and represented as a vector, thus we can effectively reduce the high complexity of graph matching. After that, instead of training a classifier with labeled samples, we construct malware link network based on SRAs and apply community detection algorithms on it to group the unlabeled samples into groups. We implement these ideas in a system called GefDroid that performs Graph embedding based familial analysis of AnDroid malware using unsupervised learning. Moreover, we conduct extensive experiments to evaluate GefDroid on three datasets with ground truth. The results show that GefDroid can achieve high agreements (0.707-0.883 in term of NMI) between the clustering results and the ground truth. Furthermore, GefDroid requires only linear run-time overhead and takes around 8.6s to analyze a sample on average, which is considerably faster than the previous work.

Wu, Y., Li, X., Zou, D., Yang, W., Zhang, X., Jin, H..  2019.  MalScan: Fast Market-Wide Mobile Malware Scanning by Social-Network Centrality Analysis. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). :139—150.

Malware scanning of an app market is expected to be scalable and effective. However, existing approaches use either syntax-based features which can be evaded by transformation attacks or semantic-based features which are usually extracted by performing expensive program analysis. Therefor, in this paper, we propose a lightweight graph-based approach to perform Android malware detection. Instead of traditional heavyweight static analysis, we treat function call graphs of apps as social networks and perform social-network-based centrality analysis to represent the semantic features of the graphs. Our key insight is that centrality provides a succinct and fault-tolerant representation of graph semantics, especially for graphs with certain amount of inaccurate information (e.g., inaccurate call graphs). We implement a prototype system, MalScan, and evaluate it on datasets of 15,285 benign samples and 15,430 malicious samples. Experimental results show that MalScan is capable of detecting Android malware with up to 98% accuracy under one second which is more than 100 times faster than two state-of-the-art approaches, namely MaMaDroid and Drebin. We also demonstrate the feasibility of MalScan on market-wide malware scanning by performing a statistical study on over 3 million apps. Finally, in a corpus of dataset collected from Google-Play app market, MalScan is able to identify 18 zero-day malware including malware samples that can evade detection of existing tools.

2020-12-02
Malvankar, A., Payne, J., Budhraja, K. K., Kundu, A., Chari, S., Mohania, M..  2019.  Malware Containment in Cloud. 2019 First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). :221—227.

Malware is pervasive and poses serious threats to normal operation of business processes in cloud. Cloud computing environments typically have hundreds of hosts that are connected to each other, often with high risk trust assumptions and/or protection mechanisms that are not difficult to break. Malware often exploits such weaknesses, as its immediate goal is often to spread itself to as many hosts as possible. Detecting this propagation is often difficult to address because the malware may reside in multiple components across the software or hardware stack. In this scenario, it is usually best to contain the malware to the smallest possible number of hosts, and it's also critical for system administration to resolve the issue in a timely manner. Furthermore, resolution often requires that several participants across different organizational teams scramble together to address the intrusion. In this vision paper, we define this problem in detail. We then present our vision of decentralized malware containment and the challenges and issues associated with this vision. The approach of containment involves detection and response using graph analytics coupled with a blockchain framework. We propose the use of a dominance frontier for profile nodes which must be involved in the containment process. Smart contracts are used to obtain consensus amongst the involved parties. The paper presents a basic implementation of this proposal. We have further discussed some open problems related to our vision.

2020-12-01
Usama, M., Asim, M., Latif, S., Qadir, J., Ala-Al-Fuqaha.  2019.  Generative Adversarial Networks For Launching and Thwarting Adversarial Attacks on Network Intrusion Detection Systems. 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC). :78—83.

Intrusion detection systems (IDSs) are an essential cog of the network security suite that can defend the network from malicious intrusions and anomalous traffic. Many machine learning (ML)-based IDSs have been proposed in the literature for the detection of malicious network traffic. However, recent works have shown that ML models are vulnerable to adversarial perturbations through which an adversary can cause IDSs to malfunction by introducing a small impracticable perturbation in the network traffic. In this paper, we propose an adversarial ML attack using generative adversarial networks (GANs) that can successfully evade an ML-based IDS. We also show that GANs can be used to inoculate the IDS and make it more robust to adversarial perturbations.

SAADI, C., kandrouch, i, CHAOUI, H..  2019.  Proposed security by IDS-AM in Android system. 2019 5th International Conference on Optimization and Applications (ICOA). :1—7.

Mobile systems are always growing, automatically they need enough resources to secure them. Indeed, traditional techniques for protecting the mobile environment are no longer effective. We need to look for new mechanisms to protect the mobile environment from malicious behavior. In this paper, we examine one of the most popular systems, Android OS. Next, we will propose a distributed architecture based on IDS-AM to detect intrusions by mobile agents (IDS-AM).

2020-11-30
Stokes, J. W., Agrawal, R., McDonald, G., Hausknecht, M..  2019.  ScriptNet: Neural Static Analysis for Malicious JavaScript Detection. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :1–8.
Malicious scripts are an important computer infection threat vector for computer users. For internet-scale processing, static analysis offers substantial computing efficiencies. We propose the ScriptNet system for neural malicious JavaScript detection which is based on static analysis. We also propose a novel deep learning model, Pre-Informant Learning (PIL), which processes Javascript files as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our model variants are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating this model on a large corpus of 212,408 JavaScript files indicates that the best performing PIL model offers a 98.10% true positive rate (TPR) for the first 60K byte subsequences and 81.66% for the full-length files, at a false positive rate (FPR) of 0.50%. Both models significantly outperform several baseline models. The best performing PIL model can successfully detect 92.02% of unknown malware samples in a hindsight experiment where the true labels of the malicious JavaScript files were not known when the model was trained.
2020-11-23
Gao, Y., Li, X., Li, J., Gao, Y., Guo, N..  2018.  Graph Mining-based Trust Evaluation Mechanism with Multidimensional Features for Large-scale Heterogeneous Threat Intelligence. 2018 IEEE International Conference on Big Data (Big Data). :1272–1277.
More and more organizations and individuals start to pay attention to real-time threat intelligence to protect themselves from the complicated, organized, persistent and weaponized cyber attacks. However, most users worry about the trustworthiness of threat intelligence provided by TISPs (Threat Intelligence Sharing Platforms). The trust evaluation mechanism has become a hot topic in applications of TISPs. However, most current TISPs do not present any practical solution for trust evaluation of threat intelligence itself. In this paper, we propose a graph mining-based trust evaluation mechanism with multidimensional features for large-scale heterogeneous threat intelligence. This mechanism provides a feasible scheme and achieves the task of trust evaluation for TISP, through the integration of a trust-aware intelligence architecture model, a graph mining-based intelligence feature extraction method, and an automatic and interpretable trust evaluation algorithm. We implement this trust evaluation mechanism in a practical TISP (called GTTI), and evaluate the performance of our system on a real-world dataset from three popular cyber threat intelligence sharing platforms. Experimental results show that our mechanism can achieve 92.83% precision and 93.84% recall in trust evaluation. To the best of our knowledge, this work is the first to evaluate the trust level of heterogeneous threat intelligence automatically from the perspective of graph mining with multidimensional features including source, content, time, and feedback. Our work is beneficial to provide assistance on intelligence quality for the decision-making of human analysts, build a trust-aware threat intelligence sharing platform, and enhance the availability of heterogeneous threat intelligence to protect organizations against cyberspace attacks effectively.
2020-11-17
Qian, K., Parizi, R. M., Lo, D..  2018.  OWASP Risk Analysis Driven Security Requirements Specification for Secure Android Mobile Software Development. 2018 IEEE Conference on Dependable and Secure Computing (DSC). :1—2.
The security threats to mobile applications are growing explosively. Mobile apps flaws and security defects open doors for hackers to break in and access sensitive information. Defensive requirements analysis should be an integral part of secure mobile SDLC. Developers need to consider the information confidentiality and data integrity, to verify the security early in the development lifecycle rather than fixing the security holes after attacking and data leaks take place. Early eliminating known security vulnerabilities will help developers increase the security of apps and reduce the likelihood of exploitation. However, many software developers lack the necessary security knowledge and skills at the development stage, and that's why Secure Mobile Software Development education is very necessary for mobile software engineers. In this paper, we propose a guided security requirement analysis based on OWASP Mobile Top ten security risk recommendations for Android mobile software development and its traceability of the developmental controls in SDLC. Building secure apps immune to the OWASP Mobile Top ten risks would be an effective approach to provide very useful mobile security guidelines.
Jaiswal, M., Malik, Y., Jaafar, F..  2018.  Android gaming malware detection using system call analysis. 2018 6th International Symposium on Digital Forensic and Security (ISDFS). :1—5.
Android operating systems have become a prime target for attackers as most of the market is currently dominated by Android users. The situation gets worse when users unknowingly download or sideload cloning applications, especially gaming applications that look like benign games. In this paper, we present, a dynamic Android gaming malware detection system based on system call analysis to classify malicious and legitimate games. We performed the dynamic system call analysis on normal and malicious gaming applications while applications are in execution state. Our analysis reveals the similarities and differences between benign and malware game system calls and shows how dynamically analyzing the behavior of malicious activity through system calls during runtime makes it easier and is more effective to detect malicious applications. Experimental analysis and results shows the efficiency and effectiveness of our approach.
2020-11-09
Kemp, C., Calvert, C., Khoshgoftaar, T..  2018.  Utilizing Netflow Data to Detect Slow Read Attacks. 2018 IEEE International Conference on Information Reuse and Integration (IRI). :108–116.
Attackers can leverage several techniques to compromise computer networks, ranging from sophisticated malware to DDoS (Distributed Denial of Service) attacks that target the application layer. Application layer DDoS attacks, such as Slow Read, are implemented with just enough traffic to tie up CPU or memory resources causing web and application servers to go offline. Such attacks can mimic legitimate network requests making them difficult to detect. They also utilize less volume than traditional DDoS attacks. These low volume attack methods can often go undetected by network security solutions until it is too late. In this paper, we explore the use of machine learners for detecting Slow Read DDoS attacks on web servers at the application layer. Our approach uses a generated dataset based upon Netflow data collected at the application layer on a live network environment. Our Netflow data uses the IP Flow Information Export (IPFIX) standard providing significant flexibility and features. These Netflow features can process and handle a growing amount of traffic and have worked well in our previous DDoS work detecting evasion techniques. Our generated dataset consists of real-world network data collected from a production network. We use eight different classifiers to build Slow Read attack detection models. Our wide selection of learners provides us with a more comprehensive analysis of Slow Read detection models. Experimental results show that the machine learners were quite successful in identifying the Slow Read attacks with a high detection and low false alarm rate. The experiment demonstrates that our chosen Netflow features are discriminative enough to detect such attacks accurately.
2020-11-04
Apruzzese, G., Colajanni, M., Ferretti, L., Marchetti, M..  2019.  Addressing Adversarial Attacks Against Security Systems Based on Machine Learning. 2019 11th International Conference on Cyber Conflict (CyCon). 900:1—18.

Machine-learning solutions are successfully adopted in multiple contexts but the application of these techniques to the cyber security domain is complex and still immature. Among the many open issues that affect security systems based on machine learning, we concentrate on adversarial attacks that aim to affect the detection and prediction capabilities of machine-learning models. We consider realistic types of poisoning and evasion attacks targeting security solutions devoted to malware, spam and network intrusion detection. We explore the possible damages that an attacker can cause to a cyber detector and present some existing and original defensive techniques in the context of intrusion detection systems. This paper contains several performance evaluations that are based on extensive experiments using large traffic datasets. The results highlight that modern adversarial attacks are highly effective against machine-learning classifiers for cyber detection, and that existing solutions require improvements in several directions. The paper paves the way for more robust machine-learning-based techniques that can be integrated into cyber security platforms.

Flores, P..  2019.  Digital Simulation in the Virtual World: Its Effect in the Knowledge and Attitude of Students Towards Cybersecurity. 2019 Sixth HCT Information Technology Trends (ITT). :1—5.

The search for alternative delivery modes to teaching has been one of the pressing concerns of numerous educational institutions. One key innovation to improve teaching and learning is e-learning which has undergone enormous improvements. From its focus on text-based environment, it has evolved into Virtual Learning Environments (VLEs) which provide more stimulating and immersive experiences among learners and educators. An example of VLEs is the virtual world which is an emerging educational platform among universities worldwide. One very interesting topic that can be taught using the virtual world is cybersecurity. Simulating cybersecurity in the virtual world may give a realistic experience to students which can be hardly achieved by classroom teaching. To date, there are quite a number of studies focused on cybersecurity awareness and cybersecurity behavior. But none has focused looking into the effect of digital simulation in the virtual world, as a new educational platform, in the cybersecurity attitude of the students. It is in this regard that this study has been conducted by designing simulation in the virtual world lessons that teaches the five aspects of cybersecurity namely; malware, phishing, social engineering, password usage and online scam, which are the most common cybersecurity issues. The study sought to examine the effect of this digital simulation design in the cybersecurity knowledge and attitude of the students. The result of the study ascertains that students exposed under simulation in the virtual world have a greater positive change in cybersecurity knowledge and attitude than their counterparts.

2020-10-30
Basu, Kanad, Elnaggar, Rana, Chakrabarty, Krishnendu, Karri, Ramesh.  2019.  PREEMPT: PReempting Malware by Examining Embedded Processor Traces. 2019 56th ACM/IEEE Design Automation Conference (DAC). :1—6.

Anti-virus software (AVS) tools are used to detect Malware in a system. However, software-based AVS are vulnerable to attacks. A malicious entity can exploit these vulnerabilities to subvert the AVS. Recently, hardware components such as Hardware Performance Counters (HPC) have been used for Malware detection. In this paper, we propose PREEMPT, a zero overhead, high-accuracy and low-latency technique to detect Malware by re-purposing the embedded trace buffer (ETB), a debug hardware component available in most modern processors. The ETB is used for post-silicon validation and debug and allows us to control and monitor the internal activities of a chip, beyond what is provided by the Input/Output pins. PREEMPT combines these hardware-level observations with machine learning-based classifiers to preempt Malware before it can cause damage. There are many benefits of re-using the ETB for Malware detection. It is difficult to hack into hardware compared to software, and hence, PREEMPT is more robust against attacks than AVS. PREEMPT does not incur performance penalties. Finally, PREEMPT has a high True Positive value of 94% and maintains a low False Positive value of 2%.

2020-10-29
Belenko, Viacheslav, Krundyshev, Vasiliy, Kalinin, Maxim.  2019.  Intrusion detection for Internet of Things applying metagenome fast analysis. 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity (WorldS4). :129—135.
Today, intrusion detection and prevention systems (IDS / IPS) are a necessary element of protection against network attacks. The main goal of such systems is to identify an unauthorized access to the network and take appropriate countermeasures: alarming security officers about intrusion, reconfiguration of firewall to block further acts of the attacker, protection against cyberattacks and malware. For traditional computer networks there are a large number of sufficiently effective approaches for protection against malicious activity, however, for the rapidly developing dynamic adhoc networks (Internet of Things - IoT, MANET, WSN, etc.) the task of creating a universal protection means is quite acute. In this paper, we review various methods for detecting polymorphic intrusion activity (polymorphic viral code and sequences of operations), present a comparative analysis, and implement the suggested technology for detecting polymorphic chains of operations using bioinformatics for IoT. The proposed approach has been tested with different lengths of operation sequences and different k-measures, as a result of which the optimal parameters of the proposed method have been determined.
Vi, Bao Ngoc, Noi Nguyen, Huu, Nguyen, Ngoc Tran, Truong Tran, Cao.  2019.  Adversarial Examples Against Image-based Malware Classification Systems. 2019 11th International Conference on Knowledge and Systems Engineering (KSE). :1—5.

Malicious software, known as malware, has become urgently serious threat for computer security, so automatic mal-ware classification techniques have received increasing attention. In recent years, deep learning (DL) techniques for computer vision have been successfully applied for malware classification by visualizing malware files and then using DL to classify visualized images. Although DL-based classification systems have been proven to be much more accurate than conventional ones, these systems have been shown to be vulnerable to adversarial attacks. However, there has been little research to consider the danger of adversarial attacks to visualized image-based malware classification systems. This paper proposes an adversarial attack method based on the gradient to attack image-based malware classification systems by introducing perturbations on resource section of PE files. The experimental results on the Malimg dataset show that by a small interference, the proposed method can achieve success attack rate when challenging convolutional neural network malware classifiers.

Xylogiannopoulos, Konstantinos F., Karampelas, Panagiotis, Alhajj, Reda.  2019.  Text Mining for Malware Classification Using Multivariate All Repeated Patterns Detection. 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :887—894.

Mobile phones have become nowadays a commodity to the majority of people. Using them, people are able to access the world of Internet and connect with their friends, their colleagues at work or even unknown people with common interests. This proliferation of the mobile devices has also been seen as an opportunity for the cyber criminals to deceive smartphone users and steel their money directly or indirectly, respectively, by accessing their bank accounts through the smartphones or by blackmailing them or selling their private data such as photos, credit card data, etc. to third parties. This is usually achieved by installing malware to smartphones masking their malevolent payload as a legitimate application and advertise it to the users with the hope that mobile users will install it in their devices. Thus, any existing application can easily be modified by integrating a malware and then presented it as a legitimate one. In response to this, scientists have proposed a number of malware detection and classification methods using a variety of techniques. Even though, several of them achieve relatively high precision in malware classification, there is still space for improvement. In this paper, we propose a text mining all repeated pattern detection method which uses the decompiled files of an application in order to classify a suspicious application into one of the known malware families. Based on the experimental results using a real malware dataset, the methodology tries to correctly classify (without any misclassification) all randomly selected malware applications of 3 categories with 3 different families each.

Choi, Seok-Hwan, Shin, Jin-Myeong, Liu, Peng, Choi, Yoon-Ho.  2019.  Robustness Analysis of CNN-based Malware Family Classification Methods Against Various Adversarial Attacks. 2019 IEEE Conference on Communications and Network Security (CNS). :1—6.

As malware family classification methods, image-based classification methods have attracted much attention. Especially, due to the fast classification speed and the high classification accuracy, Convolutional Neural Network (CNN)-based malware family classification methods have been studied. However, previous studies on CNN-based classification methods focused only on improving the classification accuracy of malware families. That is, previous studies did not consider the cases that the accuracy of CNN-based malware classification methods can be decreased under the existence of adversarial attacks. In this paper, we analyze the robustness of various CNN-based malware family classification models under adversarial attacks. While adding imperceptible non-random perturbations to the input image, we measured how the accuracy of the CNN-based malware family classification model can be affected. Also, we showed the influence of three significant visualization parameters(i.e., the size of input image, dimension of input image, and conversion color of a special character)on the accuracy variation under adversarial attacks. From the evaluation results using the Microsoft malware dataset, we showed that even the accuracy over 98% of the CNN-based malware family classification method can be decreased to less than 7%.

Roseline, S. Abijah, Sasisri, A. D., Geetha, S., Balasubramanian, C..  2019.  Towards Efficient Malware Detection and Classification using Multilayered Random Forest Ensemble Technique. 2019 International Carnahan Conference on Security Technology (ICCST). :1—6.

The exponential growth rate of malware causes significant security concern in this digital era to computer users, private and government organizations. Traditional malware detection methods employ static and dynamic analysis, which are ineffective in identifying unknown malware. Malware authors develop new malware by using polymorphic and evasion techniques on existing malware and escape detection. Newly arriving malware are variants of existing malware and their patterns can be analyzed using the vision-based method. Malware patterns are visualized as images and their features are characterized. The alternative generation of class vectors and feature vectors using ensemble forests in multiple sequential layers is performed for classifying malware. This paper proposes a hybrid stacked multilayered ensembling approach which is robust and efficient than deep learning models. The proposed model outperforms the machine learning and deep learning models with an accuracy of 98.91%. The proposed system works well for small-scale and large-scale data since its adaptive nature of setting parameters (number of sequential levels) automatically. It is computationally efficient in terms of resources and time. The method uses very fewer hyper-parameters compared to deep neural networks.

Priyamvada Davuluru, Venkata Salini, Narayanan Narayanan, Barath, Balster, Eric J..  2019.  Convolutional Neural Networks as Classification Tools and Feature Extractors for Distinguishing Malware Programs. 2019 IEEE National Aerospace and Electronics Conference (NAECON). :273—278.

Classifying malware programs is a research area attracting great interest for Anti-Malware industry. In this research, we propose a system that visualizes malware programs as images and distinguishes those using Convolutional Neural Networks (CNNs). We study the performance of several well-established CNN based algorithms such as AlexNet, ResNet and VGG16 using transfer learning approaches. We also propose a computationally efficient CNN-based architecture for classification of malware programs. In addition, we study the performance of these CNNs as feature extractors by using Support Vector Machine (SVM) and K-nearest Neighbors (kNN) for classification purposes. We also propose fusion methods to boost the performance further. We make use of the publicly available database provided by Microsoft Malware Classification Challenge (BIG 2015) for this study. Our overall performance is 99.4% for a set of 2174 test samples comprising 9 different classes thereby setting a new benchmark.

Wei, Qu, Xiao, Shi, Dongbao, Li.  2019.  Malware Classification System Based on Machine Learning. 2019 Chinese Control And Decision Conference (CCDC). :647—652.

The main challenge for malware researchers is the large amount of data and files that need to be evaluated for potential threats. Researchers analyze a large number of new malware daily and classify them in order to extract common features. Therefore, a system that can ensure and improve the efficiency and accuracy of the classification is of great significance for the study of malware characteristics. A high-performance, high-efficiency automatic classification system based on multi-feature selection fusion of machine learning is proposed in this paper. Its performance and efficiency, according to our experiments, have been greatly improved compared to single-featured systems.

Mahajan, Ginika, Saini, Bhavna, Anand, Shivam.  2019.  Malware Classification Using Machine Learning Algorithms and Tools. 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP). :1—8.

Malware classification is the process of categorizing the families of malware on the basis of their signatures. This work focuses on classifying the emerging malwares on the basis of comparable features of similar malwares. This paper proposes a novel framework that categorizes malware samples into their families and can identify new malware samples for analysis. For this six diverse classification techniques of machine learning are used. To get more comparative and thus accurate classification results, analysis is done using two different tools, named as Knime and Orange. The work proposed can help in identifying and thus cleaning new malwares and classifying malware into their families. The correctness of family classification of malwares is investigated in terms of confusion matrix, accuracy and Cohen's Kappa. After evaluation it is analyzed that Random Forest gives the highest accuracy.

Lo, Wai Weng, Yang, Xu, Wang, Yapeng.  2019.  An Xception Convolutional Neural Network for Malware Classification with Transfer Learning. 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS). :1—5.

In this work, we applied a deep Convolutional Neural Network (CNN) with Xception model to perform malware image classification. The Xception model is a recently developed special CNN architecture that is more powerful with less over- fitting problems than the current popular CNN models such as VGG16. However only a few use cases of the Xception model can be found in literature, and it has never been used to solve the malware classification problem. The performance of our approach was compared with other methods including KNN, SVM, VGG16 etc. The experiments on two datasets (Malimg and Microsoft Malware Dataset) demonstrated that the Xception model can achieve the highest training accuracy than all other approaches including the champion approach, and highest validation accuracy than all other approaches including VGG16 model which are using image-based malware classification (except the champion solution as this information was not provided). Additionally, we proposed a novel ensemble model to combine the predictions from .bytes files and .asm files, showing that a lower logloss can be achieved. Although the champion on the Microsoft Malware Dataset achieved a bit lower logloss, our approach does not require any features engineering, making it more effective to adapt to any future evolution in malware, and very much less time consuming than the champion's solution.

2020-10-26
Black, Paul, Gondal, Iqbal, Vamplew, Peter, Lakhotia, Arun.  2019.  Evolved Similarity Techniques in Malware Analysis. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :404–410.

Malware authors are known to reuse existing code, this development process results in software evolution and a sequence of versions of a malware family containing functions that show a divergence from the initial version. This paper proposes the term evolved similarity to account for this gradual divergence of similarity across the version history of a malware family. While existing techniques are able to match functions in different versions of malware, these techniques work best when the version changes are relatively small. This paper introduces the concept of evolved similarity and presents automated Evolved Similarity Techniques (EST). EST differs from existing malware function similarity techniques by focusing on the identification of significantly modified functions in adjacent malware versions and may also be used to identify function similarity in malware samples that differ by several versions. The challenge in identifying evolved malware function pairs lies in identifying features that are relatively invariant across evolved code. The research in this paper makes use of the function call graph to establish these features and then demonstrates the use of these techniques using Zeus malware.