Visible to the public Biblio

Filters: Keyword is LLVM  [Clear All Filters]
2023-04-28
Moses, William S., Narayanan, Sri Hari Krishna, Paehler, Ludger, Churavy, Valentin, Schanen, Michel, Hückelheim, Jan, Doerfert, Johannes, Hovland, Paul.  2022.  Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. :1–18.
Derivatives are key to numerous science, engineering, and machine learning applications. While existing tools generate derivatives of programs in a single language, modern parallel applications combine a set of frameworks and languages to leverage available performance and function in an evolving hardware landscape. We propose a scheme for differentiating arbitrary DAG-based parallelism that preserves scalability and efficiency, implemented into the LLVM-based Enzyme automatic differentiation framework. By integrating with a full-fledged compiler backend, Enzyme can differentiate numerous parallel frameworks and directly control code generation. Combined with its ability to differentiate any LLVM-based language, this flexibility permits Enzyme to leverage the compiler tool chain for parallel and differentiation-specitic optimizations. We differentiate nine distinct versions of the LULESH and miniBUDE applications, written in different programming languages (C++, Julia) and parallel frameworks (OpenMP, MPI, RAJA, Julia tasks, MPI.jl), demonstrating similar scalability to the original program. On benchmarks with 64 threads or nodes, we find a differentiation overhead of 3.4–6.8× on C++ and 5.4–12.5× on Julia.
2022-09-20
Ndemeye, Bosco, Hussain, Shahid, Norris, Boyana.  2021.  Threshold-Based Analysis of the Code Quality of High-Performance Computing Software Packages. 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C). :222—228.
Many popular metrics used for the quantification of the quality or complexity of a codebase (e.g. cyclomatic complexity) were developed in the 1970s or 1980s when source code sizes were significantly smaller than they are today, and before a number of modern programming language features were introduced in different languages. Thus, the many thresholds that were suggested by researchers for deciding whether a given function is lacking in a given quality dimension need to be updated. In the pursuit of this goal, we study a number of open-source high-performance codes, each of which has been in development for more than 15 years—a characteristic which we take to imply good design to score them in terms of their source codes' quality and to relax the above-mentioned thresholds. First, we employ the LLVM/Clang compiler infrastructure and introduce a Clang AST tool to gather AST-based metrics, as well as an LLVM IR pass for those based on a source code's static call graph. Second, we perform statistical analysis to identify the reference thresholds of 22 code quality and callgraph-related metrics at a fine grained level.
2022-03-14
Bauer, Markus, Rossow, Christian.  2021.  NoVT: Eliminating C++ Virtual Calls to Mitigate Vtable Hijacking. 2021 IEEE European Symposium on Security and Privacy (EuroS P). :650—666.
The vast majority of nowadays remote code execution attacks target virtual function tables (vtables). Attackers hijack vtable pointers to change the control flow of a vulnerable program to their will, resulting in full control over the underlying system. In this paper, we present NoVT, a compiler-based defense against vtable hijacking. Instead of protecting vtables for virtual dispatch, our solution replaces them with switch-case constructs that are inherently control-flow safe, thus preserving control flow integrity of C++ virtual dispatch. NoVT extends Clang to perform a class hierarchy analysis on C++ source code. Instead of a vtable, each class gets unique identifier numbers which are used to dispatch the correct method implementation. Thereby, NoVT inherently protects all usages of a vtable, not just virtual dispatch. We evaluate NoVT on common benchmark applications and real-world programs including Chromium. Despite its strong security guarantees, NoVT improves runtime performance of most programs (mean overhead −0.5%, −3.7% min, 2% max). In addition, protected binaries are slightly smaller than unprotected ones. NoVT works on different CPU architectures and protects complex C++ programs against strong attacks like COOP and ShrinkWrap.
2021-03-15
Simon, L., Verma, A..  2020.  Improving Fuzzing through Controlled Compilation. 2020 IEEE European Symposium on Security and Privacy (EuroS P). :34–52.
We observe that operations performed by standard compilers harm fuzzing because the optimizations and the Intermediate Representation (IR) lead to transformations that improve execution speed at the expense of fuzzing. To remedy this problem, we propose `controlled compilation', a set of techniques to automatically re-factor a program's source code and cherry pick beneficial compiler optimizations to improve fuzzing. We design, implement and evaluate controlled compilation by building a new toolchain with Clang/LLVM. We perform an evaluation on 10 open source projects and compare the results of AFL to state-of-the-art grey-box fuzzers and concolic fuzzers. We show that when programs are compiled with this new toolchain, AFL covers 30 % new code on average and finds 21 additional bugs in real world programs. Our study reveals that controlled compilation often covers more code and finds more bugs than state-of-the-art fuzzing techniques, without the need to write a fuzzer from scratch or resort to advanced techniques. We identify two main reasons to explain why. First, it has proven difficult for researchers to appropriately configure existing fuzzers such as AFL. To address this problem, we provide guidelines and new LLVM passes to help automate AFL's configuration. This will enable researchers to perform a fairer comparison with AFL. Second, we find that current coverage-based evaluation measures (e.g. the total number of visited lines, edges or BBs) are inadequate because they lose valuable information such as which parts of a program a fuzzer actually visits and how consistently it does so. Coverage is considered a useful metric to evaluate a fuzzer's performance and devise a fuzzing strategy. However, the lack of a standard methodology for evaluating coverage remains a problem. To address this, we propose a rigorous evaluation methodology based on `qualitative coverage'. Qualitative coverage uniquely identifies each program line to help understand which lines are commonly visited by different fuzzers vs. which lines are visited only by a particular fuzzer. Throughout our study, we show the benefits of this new evaluation methodology. For example we provide valuable insights into the consistency of fuzzers, i.e. their ability to cover the same code or find the same bug across multiple independent runs. Overall, our evaluation methodology based on qualitative coverage helps to understand if a fuzzer performs better, worse, or is complementary to another fuzzer. This helps security practitioners adjust their fuzzing strategies.
Brauckmann, A., Goens, A., Castrillon, J..  2020.  ComPy-Learn: A toolbox for exploring machine learning representations for compilers. 2020 Forum for Specification and Design Languages (FDL). :1–4.
Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.
2020-01-27
Kreindl, Jacob, Bonetta, Daniele, Mössenböck, Hanspeter.  2019.  Towards efficient, multi-language dynamic taint analysis. Proceedings of the 16th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes. :85–94.
Dynamic taint analysis is a program analysis technique in which data is marked and its propagation is tracked while the program is executing. It is applied to solve problems in many fields, especially in software security. Current taint analysis platforms are limited to a single programming language, and therefore cannot support programs which, as is common today, are implemented in multiple programming languages. Current implementations of dynamic taint analysis also incur a significant performance overhead. In this paper we address both these limitations (1) by presenting our vision of a multi-language dynamic taint analysis platform, which is built around a language-agnostic core framework that is extended by language-specific front-ends and (2) by discussing the use of speculative optimization and dynamic compilation to reduce the execution overhead of dynamic taint analysis applications. An implementation of such a platform would enable dynamic taint analyses that can target multiple languages in one analysis implementation and can track tainted data across language boundaries. We describe this approach in the context of the GraalVM runtime and its included JIT compiler, Graal, which allows us to target both dynamic and static languages.
2019-12-02
Simon, Laurent, Chisnall, David, Anderson, Ross.  2018.  What You Get is What You C: Controlling Side Effects in Mainstream C Compilers. 2018 IEEE European Symposium on Security and Privacy (EuroS P). :1–15.
Security engineers have been fighting with C compilers for years. A careful programmer would test for null pointer dereferencing or division by zero; but the compiler would fail to understand, and optimize the test away. Modern compilers now have dedicated options to mitigate this. But when a programmer tries to control side effects of code, such as to make a cryptographic algorithm execute in constant time, the problem remains. Programmers devise complex tricks to obscure their intentions, but compiler writers find ever smarter ways to optimize code. A compiler upgrade can suddenly and without warning open a timing channel in previously secure code. This arms race is pointless and has to stop. We argue that we must stop fighting the compiler, and instead make it our ally. As a starting point, we analyze the ways in which compiler optimization breaks implicit properties of crypto code; and add guarantees for two of these properties in Clang/LLVM. Our work explores what is actually involved in controlling side effects on modern CPUs with a standard toolchain. Similar techniques can and should be applied to other security properties; achieving intentions by compiler commands or annotations makes them explicit, so we can reason about them. It is already understood that explicitness is essential for cryptographic protocol security and for compiler performance; it is essential for language security too. We therefore argue that this should be only the first step in a sustained engineering effort.
2019-02-08
Kroes, Taddeus, Altinay, Anil, Nash, Joseph, Na, Yeoul, Volckaert, Stijn, Bos, Herbert, Franz, Michael, Giuffrida, Cristiano.  2018.  BinRec: Attack Surface Reduction Through Dynamic Binary Recovery. Proceedings of the 2018 Workshop on Forming an Ecosystem Around Software Transformation. :8-13.

Compile-time specialization and feature pruning through static binary rewriting have been proposed repeatedly as techniques for reducing the attack surface of large programs, and for minimizing the trusted computing base. We propose a new approach to attack surface reduction: dynamic binary lifting and recompilation. We present BinRec, a binary recompilation framework that lifts binaries to a compiler-level intermediate representation (IR) to allow complex transformations on the captured code. After transformation, BinRec lowers the IR back to a "recovered" binary, which is semantically equivalent to the input binary, but does have its unnecessary features removed. Unlike existing approaches, which are mostly based on static analysis and rewriting, our framework analyzes and lifts binaries dynamically. The crucial advantage is that we can not only observe the full program including all of its dependencies, but we can also determine which program features the end-user actually uses. We evaluate the correctness and performance of BinRec, and show that our approach enables aggressive pruning of unwanted features in COTS binaries.

2017-10-13
Barry, Thierno, Couroussé, Damien, Robisson, Bruno.  2016.  Compilation of a Countermeasure Against Instruction-Skip Fault Attacks. Proceedings of the Third Workshop on Cryptography and Security in Computing Systems. :1–6.

Physical attacks especially fault attacks represent one the major threats against embedded systems. In the state of the art, software countermeasures against fault attacks are either applied at the source code level where it will very likely be removed at compilation time, or at assembly level where several transformations need to be performed on the assembly code and lead to significant overheads both in terms of code size and execution time. This paper presents the use of compiler techniques to efficiently automate the application of software countermeasures against instruction-skip fault attacks. We propose a modified LLVM compiler that considers our security objectives throughout the compilation process. Experimental results illustrate the effectiveness of this approach on AES implementations running on an ARM-based microcontroller in terms of security overhead compared to existing solutions.