Biblio
Exploits based on ROP (Return-Oriented Programming) are increasingly present in advanced attack scenarios. Testing systems for ROP-based attacks can be valuable for improving the security and reliability of software. In this paper, we propose ROPMATE, the first Visual Analytics system specifically designed to assist human red team ROP exploit builders. In contrast, previous ROP tools typically require users to inspect a puzzle of hundreds or thousands of lines of textual information, making it a daunting task. ROPMATE presents builders with a clear interface of well-defined and semantically meaningful gadgets, i.e., fragments of code already present in the binary application that can be chained to form fully-functional exploits. The system supports incrementally building exploits by suggesting gadget candidates filtered according to constraints on preserved registers and accessed memory. Several visual aids are offered to identify suitable gadgets and assemble them into semantically correct chains. We report on a preliminary user study that shows how ROPMATE can assist users in building ROP chains.
Software deobfuscation is a key challenge in malware analysis to understand the internal logic of the code and establish adequate countermeasures. In order to defeat recent obfuscation techniques, state-of-the-art generic deobfuscation methodologies are based on dynamic symbolic execution (DSE). However, DSE suffers from limitations such as code coverage and scalability. In the race to counter and remove the most advanced obfuscation techniques, there is a need to reduce the amount of code to cover. To that extend, we propose a novel deobfuscation approach based on semantic equivalence, called DoSE. With DoSE, we aim to improve and complement DSE-based deobfuscation techniques by statically eliminating obfuscation transformations (built on code-reuse). This improves the code coverage. Our method's novelty comes from the transposition of existing binary diffing techniques, namely semantic equivalence checking, to the purpose of the deobfuscation of untreated techniques, such as two-way opaque constructs, that we encounter in surreptitious software. In order to challenge DoSE, we used both known malwares such as Cryptowall, WannaCry, Flame and BitCoinMiner and obfuscated code samples. Our experimental results show that DoSE is an efficient strategy of detecting obfuscation transformations based on code-reuse with low rates of false positive and/or false negative results in practice, and up to 63% of code reduction on certain types of malwares.
Recently researchers have proposed using deep learning-based systems for malware detection. Unfortunately, all deep learning classification systems are vulnerable to adversarial learning-based attacks, or adversarial attacks, where miscreants can avoid detection by the classification algorithm with very few perturbations of the input data. Previous work has studied adversarial attacks against static analysis-based malware classifiers which only classify the content of the unknown file without execution. However, since the majority of malware is either packed or encrypted, malware classification based on static analysis often fails to detect these types of files. To overcome this limitation, anti-malware companies typically perform dynamic analysis by emulating each file in the anti-malware engine or performing in-depth scanning in a virtual machine. These strategies allow the analysis of the malware after unpacking or decryption. In this work, we study different strategies of crafting adversarial samples for dynamic analysis. These strategies operate on sparse, binary inputs in contrast to continuous inputs such as pixels in images. We then study the effects of two, previously proposed defensive mechanisms against crafted adversarial samples including the distillation and ensemble defenses. We also propose and evaluate the weight decay defense. Experiments show that with these three defenses, the number of successfully crafted adversarial samples is reduced compared to an unprotected baseline system. In particular, the ensemble defense is the most resilient to adversarial attacks. Importantly, none of the defenses significantly reduce the classification accuracy for detecting malware. Finally, we show that while adding additional hidden layers to neural models does not significantly improve the malware classification accuracy, it does significantly increase the classifier's robustness to adversarial attacks.
As the worldwide internet has non-stop developments, it comes with enormous amount automatically generated malware. Those malware had become huge threaten to computer users. A comprehensive malware family classifier can help security researchers to quickly identify characteristics of malware which help malware analysts to investigate in more efficient way. However, despite the assistance of the artificial intelligent (AI) classifiers, it has been shown that the AI-based classifiers are vulnerable to so-called adversarial attacks. In this paper, we demonstrate how the adversarial settings can be applied to the classifier of malware families classification. Our experimental results achieved high successful rate through the adversarial attack. We also find the important features which are ignored by malware analysts but useful in the future analysis.
Modern malware applies a rich arsenal of evasion techniques to render dynamic analysis ineffective. In turn, dynamic analysis tools take great pains to hide themselves from malware; typically this entails trying to be as faithful as possible to the behavior of a real run. We present a novel approach to malware analysis that turns this idea on its head, using an extreme abstraction of the operating system that intentionally strays from real behavior. The key insight is that the presence of malicious behavior is sufficient evidence of malicious intent, even if the path taken is not one that could occur during a real run of the sample. By exploring multiple paths in a system that only approximates the behavior of a real system, we can discover behavior that would often be hard to elicit otherwise. We aggregate features from multiple paths and use a funnel-like configuration of machine learning classifiers to achieve high accuracy without incurring too much of a performance penalty. We describe our system, TAMALES (The Abstract Malware Analysis LEarning System), in detail and present machine learning results using a 330K sample set showing an FPR (False Positive Rate) of 0.10% with a TPR (True Positive Rate) of 99.11%, demonstrating that extreme abstraction can be extraordinarily effective in providing data that allows a classifier to accurately detect malware.
Malware detection is an indispensable factor in security of internet oriented machines. The combinations of different features are used for dynamic malware analysis. The different combinations are generated from APIs, Summary Information, DLLs and Registry Keys Changed. Cuckoo sandbox is used for dynamic malware analysis, which is customizable, and provide good accuracy. More than 2300 features are extracted from dynamic analysis of malware and 92 features are extracted statically from binary malware using PEFILE. Static features are extracted from 39000 malicious binaries and 10000 benign files. Dynamically 800 benign files and 2200 malware files are analyzed in Cuckoo Sandbox and 2300 features are extracted. The accuracy of dynamic malware analysis is 94.64% while static analysis accuracy is 99.36%. The dynamic malware analysis is not effective due to tricky and intelligent behaviours of malwares. The dynamic analysis has some limitations due to controlled network behavior and it cannot be analyzed completely due to limited access of network.
Malware authors attempt to obfuscate and hide their code in its static and dynamic states. This paper provides a novel approach to aid analysis by intercepting and capturing malware artifacts and providing dynamic control of process flow. Capturing malware artifacts allows an analyst to more quickly and comprehensively understand malware behavior and obfuscation techniques and doing so interactively allows multiple code paths to be explored. The faster that malware can be analyzed the quicker the systems and data compromised by it can be determined and its infection stopped. This research proposes an instantiation of an interactive malware analysis and artifact capture tool.
Recently a huge trend on the internet of things (IoT) and an exponential increase in automated tools are helping malware producers to target IoT devices. The traditional security solutions against malware are infeasible due to low computing power for large-scale data in IoT environment. The number of malware and their variants are increasing due to continuous malware attacks. Consequently, the performance improvement in malware analysis is critical requirement to stop rapid expansion of malicious attacks in IoT environment. To solve this problem, the paper proposed a novel framework for classifying malware in IoT environment. To achieve flne-grained malware classification in suggested framework, the malware image classification system (MICS) is designed for representing malware image globally and locally. MICS first converts the suspicious program into the gray-scale image and then captures hybrid local and global malware features to perform malware family classification. Preliminary experimental outcomes of MICS are quite promising with 97.4% classification accuracy on 9342 windows suspicious programs of 25 families. The experimental results indicate that proposed framework is quite capable to process large-scale IoT malware.
Malicious software or malware is one of the most significant dangers facing the Internet today. In the fight against malware, users depend on anti-malware and anti-virus products to proactively detect threats before damage is done. Those products rely on static signatures obtained through malware analysis. Unfortunately, malware authors are always one step ahead in avoiding detection. This research deals with dynamic malware analysis, which emphasizes on: how the malware will behave after execution, what changes to the operating system, registry and network communication take place. Dynamic analysis opens up the doors for automatic generation of anomaly and active signatures based on the new malware's behavior. The research includes a design of honeypot to capture new malware and a complete dynamic analysis laboratory setting. We propose a standard analysis methodology by preparing the analysis tools, then running the malicious samples in a controlled environment to investigate their behavior. We analyze 173 recent Phishing emails and 45 SPIM messages in search for potentially new malwares, we present two malware samples and their comprehensive dynamic analysis.
Nowadays, the digitization of the world is under a serious threat due to the emergence of various new and complex malware every day. Due to this, the traditional signature-based methods for detection of malware effectively become an obsolete method. The efficiency of the machine learning techniques in context to the detection of malwares has been proved by state-of-the-art research works. In this paper, we have proposed a framework to detect and classify different files (e.g., exe, pdf, php, etc.) as benign and malicious using two level classifier namely, Macro (for detection of malware) and Micro (for classification of malware files as a Trojan, Spyware, Ad-ware, etc.). Our solution uses Cuckoo Sandbox for generating static and dynamic analysis report by executing the sample files in the virtual environment. In addition, a novel feature extraction module has been developed which functions based on static, behavioral and network analysis using the reports generated by the Cuckoo Sandbox. Weka Framework is used to develop machine learning models by using training datasets. The experimental results using the proposed framework shows high detection rate and high classification rate using different machine learning algorithms
The ever-increasing number of malware samples demands for automated tools that aid the analysts in the reverse engineering of complex malicious binaries. Frequently, malware communicates over an encrypted channel with external network resources under the control of malicious actors, such as Command and Control servers that control the botnet of infected machines. Hence, a key aspect in malware analysis is uncovering and understanding the semantics of network communications. In this paper we present SysTaint, a semi-automated tool that runs malware samples in a controlled environment and analyzes their execution to support the analyst in identifying the functions involved in the communication and the exchanged data. Our evaluation on four banking Trojan samples from different families shows that SysTaint is able to handle and inspect encrypted network communications, obtaining useful information on the data being sent and received, on how each sample processes this data, and on the inner portions of code that deal with the data processing.
The increasing amount of malware variants seen in the wild is causing problems for Antivirus Software vendors, unable to keep up by creating signatures for each. The methods used to develop a signature, static and dynamic analysis, have various limitations. Machine learning has been used by Antivirus vendors to detect malware based on the information gathered from the analysis process. However, adversarial examples can cause machine learning algorithms to miss-classify new data. In this paper we describe a method for malware analysis by converting malware binaries to images and then preparing those images for training within a Generative Adversarial Network. These unsupervised deep neural networks are not susceptible to adversarial examples. The conversion to images from malware binaries should be faster than using dynamic analysis and it would still be possible to link malware families together. Using the Generative Adversarial Network, malware detection could be much more effective and reliable.
In this paper we propose a new algorithm to detect Advanced Persistent Threats (APT's) that relies on a graph model of HTTP traffic. We also implement a complete detection system with a web interface that allows to interactively analyze the data. We perform a complete parameter study and experimental evaluation using data collected on a real network. The results show that the performance of our system is comparable to currently available antiviruses, although antiviruses use signatures to detect known malwares while our algorithm solely uses behavior analysis to detect new undocumented attacks.
Better understanding of mobile applications' behaviors would lead to better malware detection/classification and better app recommendation for users. In this work, we design a framework AppDNA to automatically generate a compact representation for each app to comprehensively profile its behaviors. The behavior difference between two apps can be measured by the distance between their representations. As a result, the versatile representation can be generated once for each app, and then be used for a wide variety of objectives, including malware detection, app categorizing, plagiarism detection, etc. Based on a systematic and deep understanding of an app's behavior, we propose to perform a function-call-graph-based app profiling. We carefully design a graph-encoding method to convert a typically extremely large call-graph to a 64-dimension fix-size vector to achieve robust app profiling. Our extensive evaluations based on 86,332 benign and malicious apps demonstrate that our system performs app profiling (thus malware detection, classification, and app recommendation) to a high accuracy with extremely low computation cost: it classifies 4024 (benign/malware) apps using around 5.06 second with accuracy about 93.07%; it classifies 570 malware's family (total 21 families) using around 0.83 second with accuracy 82.3%; it classifies 9,730 apps' functionality with accuracy 33.3% for a total of 7 categories and accuracy of 88.1 % for 2 categories.
In this paper we present a new approach, named DLGraph, for malware detection using deep learning and graph embedding. DLGraph employs two stacked denoising autoencoders (SDAs) for representation learning, taking into consideration computer programs' function-call graphs and Windows application programming interface (API) calls. Given a program, we first use a graph embedding technique that maps the program's function-call graph to a vector in a low-dimensional feature space. One SDA in our deep learning model is used to learn a latent representation of the embedded vector of the function-call graph. The other SDA in our model is used to learn a latent representation of the given program's Windows API calls. The two learned latent representations are then merged to form a combined feature vector. Finally, we use softmax regression to classify the combined feature vector for predicting whether the given program is malware or not. Experimental results based on different datasets demonstrate the effectiveness of the proposed approach and its superiority over a related method.
Smartphones have evolved over the years from simple devices to communicate with each other to fully functional portable computers although with comparatively less computational power but inholding multiple applications within. With the smartphone revolution, the value of personal data has increased. As technological complexities increase, so do the vulnerabilities in the system. Smartphones are the latest target for attacks. Android being an open source platform and also the most widely used smartphone OS draws the attention of many malware writers to exploit the vulnerabilities of it. Attackers try to take advantage of these vulnerabilities and fool the user and misuse their data. Malwares have come a long way from simple worms to sophisticated DDOS using Botnets, the latest trends in computer malware tend to go in the distributed direction, to evade the multiple anti-virus apps developed to counter generic viruses and Trojans. However, the recent trend in android system is to have a combination of applications which acts as malware. The applications are benign individually but when grouped, these may result into a malicious activity. This paper proposes a new category of distributed malware in android system, how it can be used to evade the current security, and how it can be detected with the help of graph matching algorithm.
The latent behavior of an information system that can exhibit extreme events, such as system faults or cyber-attacks, is complex. Recently, the invariant network has shown to be a powerful way of characterizing complex system behaviors. Structures and evolutions of the invariance network, in particular, the vanishing correlations, can shed light on identifying causal anomalies and performing system diagnosis. However, due to the dynamic and complex nature of real-world information systems, learning a reliable invariant network in a new environment often requires continuous collecting and analyzing the system surveillance data for several weeks or even months. Although the invariant networks learned from old environments have some common entities and entity relationships, these networks cannot be directly borrowed for the new environment due to the domain variety problem. To avoid the prohibitive time and resource consuming network building process, we propose TINET, a knowledge transfer based model for accelerating invariant network construction. In particular, we first propose an entity estimation model to estimate the probability of each source domain entity that can be included in the final invariant network of the target domain. Then, we propose a dependency construction model for constructing the unbiased dependency relationships by solving a two-constraint optimization problem. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of TINET. We also apply TINET to a real enterprise security system for intrusion detection. TINET achieves superior detection performance at least 20 days lead-lag time in advance with more than 75% accuracy.
Community detection in complex networks is a fundamental problem that attracts much attention across various disciplines. Previous studies have been mostly focusing on external connections between nodes (i.e., topology structure) in the network whereas largely ignoring internal intricacies (i.e., local behavior) of each node. A pair of nodes without any interaction can still share similar internal behaviors. For example, in an enterprise information network, compromised computers controlled by the same intruder often demonstrate similar abnormal behaviors even if they do not connect with each other. In this paper, we study the problem of community detection in enterprise information networks, where large-scale internal events and external events coexist on each host. The discovered host communities, capturing behavioral affinity, can benefit many comparative analysis tasks such as host anomaly assessment. In particular, we propose a novel community detection framework to identify behavior-based host communities in enterprise information networks, purely based on large-scale heterogeneous event data. We continue proposing an efficient method for assessing host's anomaly level by leveraging the detected host communities. Experimental results on enterprise networks demonstrate the effectiveness of our model.
Android applications are usually obfuscated before release, making it difficult to analyze them for malware presence or intellectual property violations. Obfuscators might hide the true intent of code by renaming variables and/or modifying program structures. It is challenging to search for executables relevant to an obfuscated application for developers to analyze efficiently. Prior approaches toward obfuscation resilient search have relied on certain structural parts of apps remaining as landmarks, un-touched by obfuscation. For instance, some prior approaches have assumed that the structural relationships between identifiers are not broken by obfuscators; others have assumed that control flow graphs maintain their structures. Both approaches can be easily defeated by a motivated obfuscator. We present a new approach, MACNETO, to search for programs relevant to obfuscated executables leveraging deep learning and principal components on instructions. MACNETO makes few assumptions about the kinds of modifications that an obfuscator might perform. We show that it has high search precision for executables obfuscated by a state-of-the-art obfuscator that changes control flow. Further, we also demonstrate the potential of MACNETO to help developers understand executables, where MACNETO infers keywords (which are from relevant un-obfuscated programs) for obfuscated executables.