Biblio
Despite decades of research on software diversification, only address space layout randomization has seen widespread adoption. Code randomization, an effective defense against return-oriented programming exploits, has remained an academic exercise mainly due to i) the lack of a transparent and streamlined deployment model that does not disrupt existing software distribution norms, and ii) the inherent incompatibility of program variants with error reporting, whitelisting, patching, and other operations that rely on code uniformity. In this work we present compiler-assisted code randomization (CCR), a hybrid approach that relies on compiler-rewriter cooperation to enable fast and robust fine-grained code randomization on end-user systems, while maintaining compatibility with existing software distribution models. The main concept behind CCR is to augment binaries with a minimal set of transformation-assisting metadata, which i) facilitate rapid fine-grained code transformation at installation or load time, and ii) form the basis for reversing any applied code transformation when needed, to maintain compatibility with existing mechanisms that rely on referencing the original code. We have implemented a prototype of this approach by extending the LLVM compiler toolchain, and developing a simple binary rewriter that leverages the embedded metadata to generate randomized variants using basic block reordering. The results of our experimental evaluation demonstrate the feasibility and practicality of CCR, as on average it incurs a modest file size increase of 11.46% and a negligible runtime overhead of 0.28%, while it is compatible with link-time optimization and control flow integrity.
Approaches for the automatic analysis of security policies on source code level cannot trivially be applied to binaries. This is due to the lacking high-level semantics of low-level object code, and the fundamental problem that control-flow recovery from binaries is difficult. We present a novel approach to recover the control-flow of binaries that is both safe and efficient. The key idea of our approach is to use the information contained in security mechanisms to approximate the targets of computed branches. To achieve this, we first define a restricted control transition intermediate language (RCTIL), which restricts the number of possible targets for each branch to a finite number of given targets. Based on this intermediate language, we demonstrate how a safe model of the control flow can be recovered without data-flow analyses. Our evaluation shows that that makes our solution more efficient than existing solutions.
Software verification has been well applied in safety critical areas and has shown the ability to provide better quality assurance for modern software. However, as lines of code and complexity of software systems increase, the scalability of verification becomes a challenge. In this paper, we present an automatic software verification framework TSV to address the scalability issues: (i) the extended structural abstraction and property-guided program slicing to solve large-scale program verification problem, saving time and memory without losing accuracy; (ii) automatically select different verification methods according to the program and property context to improve the verification efficiency. For evaluation, we compare TSV's different configurations with existing C program verifiers based on open benchmarks. We found that TSV with auto-selection performs better than with bounded model checking only or with extended structural abstraction only. Compared to existing tools such as CMBC and CPAChecker, it acquires 10%-20% improvement of accuracy and 50%-90% improvement of memory consumption.
Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.
The area of secure compilation aims to design compilers which produce hardened code that can withstand attacks from low-level co-linked components. So far, there is no formal correctness criterion for secure compilers that comes with a clear understanding of what security properties the criterion actually provides. Ideally, we would like a criterion that, if fulfilled by a compiler, guarantees that large classes of security properties of source language programs continue to hold in the compiled program, even as the compiled program is run against adversaries with low-level attack capabilities. This paper provides such a novel correctness criterion for secure compilers, called trace-preserving compilation (TPC). We show that TPC preserves a large class of security properties, namely all safety hyperproperties. Further, we show that TPC preserves more properties than full abstraction, the de-facto criterion used for secure compilation. Then, we show that several fully abstract compilers described in literature satisfy an additional, common property, which implies that they also satisfy TPC. As an illustration, we prove that a fully abstract compiler from a typed source language to an untyped target language satisfies TPC.
Security protocols are critical components for the construction of secure and dependable distributed applications, but their implementation is challenging and error prone. Therefore, tools for formal modelling and analysis of security protocols can be potentially very useful to support software engineers. However, despite such tools have been available for a long time, their adoption outside the research community has been very limited. In fact, most practitioners find such applications too complex and hardly usable for their daily work. In this paper, we present an Integrated Development Environment for the design, verification and implementation of security protocols, aimed at lowering the adoption barrier of formal methods tools for security. In the spirit of Model Driven Development, the environment supports the user in the specification of the model using the simple and intuitive language AnB (and its extension AnBx). Moreover, it provides a push-button solution for the formal verification of the abstract and concrete models, and for the automatic generation of Java implementation. This Eclipse-based IDE leverages on existing languages and tools for modelling and verification of security protocols, such as the AnBx Compiler and Code Generator, the model checker OFMC and the protocol verifier ProVerif.
Cloud computing allows users to delegate data and computation to cloud service providers, at the cost of giving up physical control of their computing infrastructure. An attacker (e.g., insider) with physical access to the computing platform can perform various physical attacks, including probing memory buses and cold-boot style attacks. Previous work on secure (co-)processors provides hardware support for memory encryption and prevents direct leakage of sensitive data over the memory bus. However, an adversary snooping on the bus can still infer sensitive information from the memory access traces. Existing work on Oblivious RAM (ORAM) provides a solution for users to put all data in an ORAM; and accesses to an ORAM are obfuscated such that no information leaks through memory access traces. This method, however, incurs significant memory access overhead. This work is the first to leverage programming language techniques to offer efficient memory-trace oblivious program execution, while providing formal security guarantees. We formally define the notion of memory-trace obliviousness, and provide a type system for verifying that a program satisfies this property. We also describe a compiler that transforms a program into a structurally similar one that satisfies memory trace obliviousness. To achieve optimal efficiency, our compiler partitions variables into several small ORAM banks rather than one large one, without risking security. We use several example programs to demonstrate the efficiency gains our compiler achieves in comparison with the naive method of placing all variables in the same ORAM.
To instill greater confidence in computations outsourced to the cloud, clients should be able to verify the correctness of the results returned. To this end, we introduce Pinocchio, a built system for efficiently verifying general computations while relying only on cryptographic assumptions. With Pinocchio, the client creates a public evaluation key to describe her computation; this setup is proportional to evaluating the computation once. The worker then evaluates the computation on a particular input and uses the evaluation key to produce a proof of correctness. The proof is only 288 bytes, regardless of the computation performed or the size of the inputs and outputs. Anyone can use a public verification key to check the proof. Crucially, our evaluation on seven applications demonstrates that Pinocchio is efficient in practice too. Pinocchio's verification time is typically 10ms: 5-7 orders of magnitude less than previous work; indeed Pinocchio is the first general-purpose system to demonstrate verification cheaper than native execution (for some apps). Pinocchio also reduces the worker's proof effort by an additional 19-60x. As an additional feature, Pinocchio generalizes to zero-knowledge proofs at a negligible cost over the base protocol. Finally, to aid development, Pinocchio provides an end-to-end toolchain that compiles a subset of C into programs that implement the verifiable computation protocol.