Biblio
The increasing growth of cybercrimes targeting mobile devices urges an efficient malware analysis platform. With the emergence of evasive malware, which is capable of detecting that it is being analyzed in virtualized environments, bare-metal analysis has become the definitive resort. Existing works mainly focus on extracting the malicious behaviors exposed during bare-metal analysis. However, after malware analysis, it is equally important to quickly restore the system to a clean state to examine the next sample. Unfortunately, state-of-the-art solutions on mobile platforms can only restore the disk, and require a time-consuming system reboot. In addition, all of the existing works require some in-guest components to assist the restoration. Therefore, a kernel-level malware is still able to detect the presence of the in-guest components. We propose Bolt, a transparent restoration mechanism for bare-metal analysis on mobile platform without rebooting. Bolt achieves a reboot-less restoration by simultaneously making a snapshot for both the physical memory and the disk. Memory snapshot is enabled by an isolated operating system (BoltOS) in the ARM TrustZone secure world, and disk snapshot is accomplished by a piece of customized firmware (BoltFTL) for flash-based block devices. Because both the BoltOS and the BoltFTL are isolated from the guest system, even kernel-level malware cannot interfere with the restoration. More importantly, Bolt does not require any modifications into the guest system. As such, Bolt is the first that simultaneously achieves efficiency, isolation, and stealthiness to recover from infection due to malware execution. We have implemented a Bolt prototype working with the Android OS. Experimental results show that Bolt can restore the guest system to a clean state in only 2.80 seconds.
Malware sandboxes, widely used by antivirus companies, mobile application marketplaces, threat detection appliances, and security researchers, face the challenge of environment-aware malware that alters its behavior once it detects that it is being executed on an analysis environment. Recent efforts attempt to deal with this problem mostly by ensuring that well-known properties of analysis environments are replaced with realistic values, and that any instrumentation artifacts remain hidden. For sandboxes implemented using virtual machines, this can be achieved by scrubbing vendor-specific drivers, processes, BIOS versions, and other VM-revealing indicators, while more sophisticated sandboxes move away from emulation-based and virtualization-based systems towards bare-metal hosts. We observe that as the fidelity and transparency of dynamic malware analysis systems improves, malware authors can resort to other system characteristics that are indicative of artificial environments. We present a novel class of sandbox evasion techniques that exploit the "wear and tear" that inevitably occurs on real systems as a result of normal use. By moving beyond how realistic a system looks like, to how realistic its past use looks like, malware can effectively evade even sandboxes that do not expose any instrumentation indicators, including bare-metal systems. We investigate the feasibility of this evasion strategy by conducting a large-scale study of wear-and-tear artifacts collected from real user devices and publicly available malware analysis services. The results of our evaluation are alarming: using simple decision trees derived from the analyzed data, malware can determine that a system is an artificial environment and not a real user device with an accuracy of 92.86%. As a step towards defending against wear-and-tear malware evasion, we develop statistical models that capture a system's age and degree of use, which can be used to aid sandbox operators in creating system i- ages that exhibit a realistic wear-and-tear state.
This paper describes the various malware datasets that we have obtained permissions to host at the University of Arizona as part of a National Science Foundation funded project. It also describes some other malware datasets that we are in the process of obtaining permissions to host at the University of Arizona. We have also discussed some preliminary work we have carried out on malware analysis using big data platforms.
Ransomwares have become a growing threat since 2012, and the situation continues to worsen until now. The lack of security mechanisms and security awareness are pushing the systems into mire of ransomware attacks. In this paper, a new framework called 2entFOX' is proposed in order to detect high survivable ransomwares (HSR). To our knowledge this framework can be considered as one of the first frameworks in ransomware detection because of little publicly-available research in this field. We analyzed Windows ransomwares' behaviour and we tried to find appropriate features which are particular useful in detecting this type of malwares with high detection accuracy and low false positive rate. After hard experimental analysis we extracted 20 effective features which due to two highly efficient ones we could achieve an appropriate set for HSRs detection. After proposing architecture based on Bayesian belief network, the final evaluation is done on some known ransomware samples and unknown ones based on six different scenarios. The result of this evaluations shows the high accuracy of 2entFox in detection of HSRs.
In recent years, the number of new examples of malware has continued to increase. To create effective countermeasures, security specialists often must manually inspect vast sandbox logs produced by the dynamic analysis method. Conversely, antivirus vendors usually publish malware analysis reports on their website. Because malware analysis reports and sandbox logs do not have direct connections, when analyzing sandbox logs, security specialists can not benefit from the information described in such expert reports. To address this issue, we developed a system called ReGenerator that automates the generation of reports related to sandbox logs by making use of existing reports published by antivirus vendors. Our system combines several techniques, including the Jaccard similarity, Natural Language Processing (NLP), and Generation (NLG), to produce concise human-readable reports describing malicious behavior for security specialists.
We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, malicious PDFs, polymorphic malware, ransomware, and exploit kits. We will conclude with our view of important research questions in the field. This is an updated version of last year's tutorial, with more information about web-based malware and malware targeting the Android market.
In an attempt to coerce useful information about the behavior of new malware families, threat analysts commonly force newly collected malicious software samples to run within a sandboxed environment. The main goal is to gather intelligence that can later be leveraged to detect and enumerate new malware infections within a network. Currently, most analysis environments "blindly" execute each newly collected malware sample for a predetermined amount of time (e.g., four to five minutes). However, a large majority of malware samples that are forced through sandbox execution are simply repackaged versions of previously seen (and already analyzed) malware. Consequently, a significant amount of time may be wasted in analyzing samples that do not generate new intelligence. In this paper, we propose MAXS, a novel probabilistic multi-hypothesis testing framework for scaling execution in malware analysis environments, including bare-metal execution environments. Our main goal is to automatically recognize whether a malware sample that is undergoing dynamic analysis has likely been seen before (e.g., in a "differently packed" form), and determine if we could therefore stop its execution early while avoiding loss of valuable malware intelligence (e.g., without missing DNS queries to never-before-seen malware command-and-control domains). We have tested our prototype implementation of MAXS over two large collections of malware execution traces obtained from two distinct production-level analysis environments. Our experimental results show that using MAXS we are able to reduce malware execution time by up to 50% in average, with less than 0.3% information loss. This roughly translates into the ability to double the capacity of malware sandbox environments, thus significantly optimizing the resources dedicated to malware execution and analysis. Our results are particularly important for bare-metal execution environments, in which it is not easy to leverage the economies of scale that characterize virtual-machine or emulation based malware sandboxes. For example, MAXS could be used to significantly cut the cost of bare-metal analysis environments by reducing the hardware resources needed to analyze a predetermined daily number of new malware samples.
Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.
Tree structures such as breadth-first search (BFS) trees and minimum spanning trees (MST) are among the most fundamental graph structures in distributed network algorithms. However, by definition, these structures are not robust against failures and even a single edge's removal can disrupt their functionality. A well-studied concept which attempts to circumvent this issue is Fault-Tolerant Tree Structures, where the tree gets augmented with additional edges from the network so that the functionality of the structure is maintained even when an edge fails. These structures, or other equivalent formulations, have been studied extensively from a centralized viewpoint. However, despite the fact that the main motivations come from distributed networks, their distributed construction has not been addressed before. In this paper, we present distributed algorithms for constructing fault tolerant BFS and MST structures. The time complexity of our algorithms are nearly optimal in the following strong sense: they almost match even the lower bounds of constructing (basic) BFS and MST trees.
Let G be a finite connected graph, and let ρ be the spectral radius of its universal cover. For example, if G is k-regular then ρ=2√k−1. We show that for every r, there is an r-covering (a.k.a. an r-lift) of G where all the new eigenvalues are bounded from above by ρ. It follows that a bipartite Ramanujan graph has a Ramanujan r-covering for every r. This generalizes the r=2 case due to Marcus, Spielman and Srivastava (2013). Every r-covering of G corresponds to a labeling of the edges of G by elements of the symmetric group Sr. We generalize this notion to labeling the edges by elements of various groups and present a broader scenario where Ramanujan coverings are guaranteed to exist. In particular, this shows the existence of richer families of bipartite Ramanujan graphs than was known before. Inspired by Marcus-Spielman-Srivastava, a crucial component of our proof is the existence of interlacing families of polynomials for complex reflection groups. The core argument of this component is taken from Marcus-Spielman-Srivastava (2015). Another important ingredient of our proof is a new generalization of the matching polynomial of a graph. We define the r-th matching polynomial of G to be the average matching polynomial of all r-coverings of G. We show this polynomial shares many properties with the original matching polynomial. For example, it is real rooted with all its roots inside [−ρ,ρ].
In this paper, we propose a new risk analysis framework that enables to supervise risks in complex and distributed systems. Our contribution is twofold. First, we provide the Risk Assessment Graphs (RAGs) as a model of risk analysis. This graph-based model is adaptable to the system changes over the time. We also introduce the potentiality and the accessibility functions which, during each time slot, evaluate respectively the chance of exploiting the RAG's nodes, and the connection time between these nodes. In addition, we provide a worst-case risk evaluation approach, based on the assumption that the intruder threats usually aim at maximising their benefits by inflicting the maximum damage to the target system (i.e. choosing the most likely paths in the RAG). We then introduce three security metrics: the propagated risk, the node risk and the global risk. We illustrate the use of our framework through the simple example of an enterprise email service. Our framework achieves both flexibility and generality requirements, it can be used to assess the external threats as well as the insider ones, and it applies to a wide set of applications.
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.
This paper proposes a novel measure for edge significance considering quantity propagation in a graph. Our method utilizes a pseudo propagation process brought by solving a problem of a load balancing on nodes. Edge significance is defined as a difference of propagation in a graph with an edge to without it. The simulation compares our proposed method with the traditional betweenness centrality in order to obtain differences of our measure to a type of centrality, which considers propagation process in a graph.
Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.
Growth of internet era and corporate sector dealings communication online has introduced crucial security challenges in cyber space. Statistics of recent large scale attacks defined new class of threat to online world, advanced persistent threat (APT) able to impact national security and economic stability of any country. From all APTs, botnet is one of the well-articulated and stealthy attacks to perform cybercrime. Botnet owners and their criminal organizations are continuously developing innovative ways to infect new targets into their networks and exploit them. The concept of botnet refers collection of compromised computers (bots) infected by automated software robots, that interact to accomplish some distributed task which run without human intervention for illegal purposes. They are mostly malicious in nature and allow cyber criminals to control the infected machines remotely without the victim's knowledge. They use various techniques, communication protocols and topologies in different stages of their lifecycle; also specifically they can upgrade their methods at any time. Botnet is global in nature and their target is to steal or destroy valuable information from organizations as well as individuals. In this paper we present real world botnet (APTs) survey.
We explicitly construct an extractor for two independent sources on n bits, each with polylogarithmic min-entropy. Our extractor outputs one bit and has polynomially small error. The best previous extractor, by Bourgain, required each source to have min-entropy .499n. A key ingredient in our construction is an explicit construction of a monotone, almost-balanced Boolean functions that are resilient to coalitions. In fact, our construction is stronger in that it gives an explicit extractor for a generalization of non-oblivious bit-fixing sources on n bits, where some unknown n-q bits are chosen almost polylogarithmic-wise independently, and the remaining q bits are chosen by an adversary as an arbitrary function of the n-q bits. The best previous construction, by Viola, achieved q quadratically smaller than our result. Our explicit two-source extractor directly implies improved constructions of a K-Ramsey graph over N vertices, improving bounds obtained by Barak et al. and matching independent work by Cohen.
Moving target defense (MTD) is an emerging paradigm in which system defenses dynamically mutate in order to decrease the overall system attack surface. Though the concept is promising, implementations have not been widely adopted. The field has been actively researched for over ten years, and has only produced a small amount of extensively adopted defenses, most notably, address space layout randomization (ASLR). This is despite the fact that there currently exist a variety of moving target implementations and proofs-of-concept. We suspect that this results from the moving target controls breaking critical system dependencies from the perspectives of users and administrators, as well as making things more difficult for attackers. As a result, the impact of the controls on overall system security is not sufficient to overcome the inconvenience imposed on legitimate system users. In this paper, we analyze a successful MTD approach. We study the control's dependency graphs, showing how we use graph theoretic and network properties to predict the effectiveness of the selected control.
Go is a programming language developed at Google, with channel-based concurrent features based on CSP. Go can detect global communication deadlocks at runtime when all threads of execution are blocked, but deadlocks in other paths of execution could be undetected. We present a new static analyser for concurrent Go code to find potential communication errors such as communication mismatch and deadlocks at compile time. Our tool extracts the communication operations as session types, which are then converted into Communicating Finite State Machines (CFSMs). Finally, we apply a recent theoretical result on choreography synthesis to generate a global graph representing the overall communication pattern of a concurrent program. If the synthesis is successful, then the program is free from communication errors. We have implemented the technique in a tool, and applied it to analyse common Go concurrency patterns and an open source application with over 700 lines of code.
Most of the detection approaches like Signature based, Anomaly based and Specification based are not able to analyze and detect all types of malware. Signature-based approach for malware detection has one major drawback that it cannot detect zero-day attacks. The fundamental limitation of anomaly based approach is its high false alarm rate. And specification-based detection often has difficulty to specify completely and accurately the entire set of valid behaviors a malware should exhibit. Modern malware developers try to avoid detection by using several techniques such as polymorphic, metamorphic and also some of the hiding techniques. In order to overcome these issues, we propose a new approach for malware analysis and detection that consist of the following twelve stages Inbound Scan, Inbound Attack, Spontaneous Attack, Client-Side Exploit, Egg Download, Device Infection, Local Reconnaissance, Network Surveillance, & Communications, Peer Coordination, Attack Preparation, and Malicious Outbound Propagation. These all stages will integrate together as interrelated process in our proposed approach. This approach had solved the limitations of all the three approaches by monitoring the behavioral activity of malware at each any every stage of life cycle and then finally it will give a report of the maliciousness of the files or software's.