Biblio
This paper introduces DeepCheck, a new approach for validating Deep Neural Networks (DNNs) based on core ideas from program analysis, specifically from symbolic execution. DeepCheck implements techniques for lightweight symbolic analysis of DNNs and applies them in the context of image classification to address two challenging problems: 1) identification of important pixels (for attribution and adversarial generation); and 2) creation of adversarial attacks. Experimental results using the MNIST data-set show that DeepCheck's lightweight symbolic analysis provides a valuable tool for DNN validation.
Software deobfuscation is a key challenge in malware analysis to understand the internal logic of the code and establish adequate countermeasures. In order to defeat recent obfuscation techniques, state-of-the-art generic deobfuscation methodologies are based on dynamic symbolic execution (DSE). However, DSE suffers from limitations such as code coverage and scalability. In the race to counter and remove the most advanced obfuscation techniques, there is a need to reduce the amount of code to cover. To that extend, we propose a novel deobfuscation approach based on semantic equivalence, called DoSE. With DoSE, we aim to improve and complement DSE-based deobfuscation techniques by statically eliminating obfuscation transformations (built on code-reuse). This improves the code coverage. Our method's novelty comes from the transposition of existing binary diffing techniques, namely semantic equivalence checking, to the purpose of the deobfuscation of untreated techniques, such as two-way opaque constructs, that we encounter in surreptitious software. In order to challenge DoSE, we used both known malwares such as Cryptowall, WannaCry, Flame and BitCoinMiner and obfuscated code samples. Our experimental results show that DoSE is an efficient strategy of detecting obfuscation transformations based on code-reuse with low rates of false positive and/or false negative results in practice, and up to 63% of code reduction on certain types of malwares.
The capability of executing so-called smart contracts in a decentralised manner is one of the compelling features of modern blockchains. Smart contracts are fully fledged programs which cannot be changed once deployed to the blockchain. They typically implement the business logic of distributed apps and carry billions of dollars worth of coins. In that respect, it is imperative that smart contracts are correct and have no vulnerabilities or bugs. However, research has identified different classes of vulnerabilities in smart contracts, some of which led to prominent multi-million dollar fraud cases. In this paper we focus on vulnerabilities related to integer bugs, a class of bugs that is particularly difficult to avoid due to some characteristics of the Ethereum Virtual Machine and the Solidity programming language. In this paper we introduce Osiris – a framework that combines symbolic execution and taint analysis, in order to accurately find integer bugs in Ethereum smart contracts. Osiris detects a greater range of bugs than existing tools, while providing a better specificity of its detection. We have evaluated its performance on a large experimental dataset containing more than 1.2 million smart contracts. We found that 42,108 contracts contain integer bugs. Besides being able to identify several vulnerabilities that have been reported in the past few months, we were also able to identify a yet unknown critical vulnerability in a couple of smart contracts that are currently deployed on the Ethereum blockchain.
Automatic exploit generation is an open challenge. Existing solutions usually explore in depth the crashing paths, i.e., paths taken by proof-of-concept (POC) inputs triggering vulnerabilities, and generate exploits when exploitable states are found along the paths. However, exploitable states do not always exist in crashing paths. Moreover, existing solutions heavily rely on symbolic execution and are not scalable in path exploration and exploit generation. In addition, few solutions could exploit heap-based vulnerabilities. In this paper, we propose a new solution revery to search for exploitable states in paths diverging from crashing paths, and generate control-flow hijacking exploits for heap-based vulnerabilities. It adopts three novel techniques:(1) a digraph to characterize a vulnerability's memory layout and its contributor instructions;(2) a fuzz solution to explore diverging paths, which have similar memory layouts as the crashing paths, in order to search more exploitable states and generate corresponding diverging inputs;(3) a stitch solution to stitch crashing paths and diverging paths together, and synthesize EXP inputs able to trigger both vulnerabilities and exploitable states. We have developed a prototype of revery based on the binary analysis engine angr, and evaluated it on a set of 19 real world CTF (capture the flag) challenges. Experiment results showed that it could generate exploits for 9 (47%) of them, and generate EXP inputs able to trigger exploitable states for another 5 (26%) of them.
Compile-time specialization and feature pruning through static binary rewriting have been proposed repeatedly as techniques for reducing the attack surface of large programs, and for minimizing the trusted computing base. We propose a new approach to attack surface reduction: dynamic binary lifting and recompilation. We present BinRec, a binary recompilation framework that lifts binaries to a compiler-level intermediate representation (IR) to allow complex transformations on the captured code. After transformation, BinRec lowers the IR back to a "recovered" binary, which is semantically equivalent to the input binary, but does have its unnecessary features removed. Unlike existing approaches, which are mostly based on static analysis and rewriting, our framework analyzes and lifts binaries dynamically. The crucial advantage is that we can not only observe the full program including all of its dependencies, but we can also determine which program features the end-user actually uses. We evaluate the correctness and performance of BinRec, and show that our approach enables aggressive pruning of unwanted features in COTS binaries.
The USB protocol has become ubiquitous, supporting devices from high-powered computing devices to small embedded devices and control systems. USB's greatest feature, its openness and expandability, is also its weakness, and attacks such as BadUSB exploit the unconstrained functionality afforded to these devices as a vector for compromise. Fundamentally, it is virtually impossible to know whether a USB device is benign or malicious. This work introduces FirmUSB, a USB-specific firmware analysis framework that uses domain knowledge of the USB protocol to examine firmware images and determine the activity that they can produce. Embedded USB devices use microcontrollers that have not been well studied by the binary analysis community, and our work demonstrates how lifters into popular intermediate representations for analysis can be built, as well as the challenges of doing so. We develop targeting algorithms and use domain knowledge to speed up these processes by a factor of 7 compared to unconstrained fully symbolic execution. We also successfully find malicious activity in embedded 8051 firmwares without the use of source code. Finally, we provide insights into the challenges of symbolic analysis on embedded architectures and provide guidance on improving tools to better handle this important class of devices.
Untrusted third-party vendors and manufacturers have raised security concerns in hardware supply chain. Among all existing solutions, formal verification methods provide powerful solutions in detection malicious behaviors at the pre-silicon stage. However, little work have been done towards built-in hardware runtime verification at the post-silicon stage. In this paper, a runtime formal verification framework is proposed to evaluate the trust of hardware during its execution. This framework combines the symbolic execution and SAT solving methods to validate the user defined properties. The proposed framework has been demonstrated on an FPGA platform using an SoC design with untrusted IPs. The experimentation results show that the proposed approach can provide high-level security assurance for hardware at runtime.
Despite significant recent advances, the effectiveness of symbolic execution is limited when used to test complex, real-world software. One of the main scalability challenges is related to constraint solving: large applications and long exploration paths lead to complex constraints, often involving big arrays indexed by symbolic expressions. In this paper, we propose a set of semantics-preserving transformations for array operations that take advantage of contextual information collected during symbolic execution. Our transformations lead to simpler encodings and hence better performance in constraint solving. The results we obtain are encouraging: we show, through an extensive experimental analysis, that our transformations help to significantly improve the performance of symbolic execution in the presence of arrays. We also show that our transformations enable the analysis of new code, which would be otherwise out of reach for symbolic execution.
We present a Network Address Translator (NAT) written in C and proven to be semantically correct according to RFC 3022, as well as crash-free and memory-safe. There exists a lot of recent work on network verification, but it mostly assumes models of network functions and proves properties specific to network configuration, such as reachability and absence of loops. Our proof applies directly to the C code of a network function, and it demonstrates the absence of implementation bugs. Prior work argued that this is not feasible (i.e., that verifying a real, stateful network function written in C does not scale) but we demonstrate otherwise: NAT is one of the most popular network functions and maintains per-flow state that needs to be properly updated and expired, which is a typical source of verification challenges. We tackle the scalability challenge with a new combination of symbolic execution and proof checking using separation logic; this combination matches well the typical structure of a network function. We then demonstrate that formally proven correctness in this case does not come at the cost of performance. The NAT code, proof toolchain, and proofs are available at [58].
Cryptocurrencies record transactions in a decentralized data structure called a blockchain. Two of the most popular cryptocurrencies, Bitcoin and Ethereum, support the feature to encode rules or scripts for processing transactions. This feature has evolved to give practical shape to the ideas of smart contracts, or full-fledged programs that are run on blockchains. Recently, Ethereum's smart contract system has seen steady adoption, supporting tens of thousands of contracts, holding millions dollars worth of virtual coins. In this paper, we investigate the security of running smart contracts based on Ethereum in an open distributed network like those of cryptocurrencies. We introduce several new security problems in which an adversary can manipulate smart contract execution to gain profit. These bugs suggest subtle gaps in the understanding of the distributed semantics of the underlying platform. As a refinement, we propose ways to enhance the operational semantics of Ethereum to make contracts less vulnerable. For developers writing contracts for the existing Ethereum system, we build a symbolic execution tool called Oyente to find potential security bugs. Among 19, 336 existing Ethereum contracts, Oyente flags 8, 833 of them as vulnerable, including the TheDAO bug which led to a 60 million US dollar loss in June 2016. We also discuss the severity of other attacks for several case studies which have source code available and confirm the attacks (which target only our accounts) in the main Ethereum network.
Automated program repair is of great promise for future programming environments. It is also of obvious importance for patching vulnerabilities in software, or for building self-healing systems for critical infra-structure. Traditional program repair techniques tend to lift the fix from elsewhere in the program via syntax based approaches. In this talk, I will mention how the search problems in program repair can be solved by semantic analysis techniques. Here semantic analysis methods are not only used to guide the search, but also for extracting formal specifications from tests. I will conclude with positioning of the syntax based and semantic based methods for vulnerability patching, future generation programming, and self-healing systems.
Transformations form an important part of developing domain specific languages, where they are used to provide semantics for typing and evaluation. Yet, few solutions exist for verifying transformations written in expressive high-level transformation languages. We take a step towards that goal, by developing a general symbolic execution technique that handles programs written in these high-level transformation languages. We use logical constraints to describe structured symbolic values, including containment, acyclicity, simple unordered collections (sets) and to handle deep type-based querying of syntax hierarchies. We evaluate this symbolic execution technique on a collection of refactoring and model transformation programs, showing that the white-box test generation tool based on symbolic execution obtains better code coverage than a black box test generator for such programs in almost all tested cases.
Dynamic taint analysis and forward symbolic execution are quickly becoming staple techniques in security analyses. Example applications of dynamic taint analysis and forward symbolic execution include malware analysis, input filter generation, test case generation, and vulnerability discovery. Despite the widespread usage of these two techniques, there has been little effort to formally define the algorithms and summarize the critical issues that arise when these techniques are used in typical security contexts. The contributions of this paper are two-fold. First, we precisely describe the algorithms for dynamic taint analysis and forward symbolic execution as extensions to the run-time semantics of a general language. Second, we highlight important implementation choices, common pitfalls, and considerations when using these techniques in a security context.