Visible to the public Improving Fuzzing through Controlled Compilation

TitleImproving Fuzzing through Controlled Compilation
Publication TypeConference Paper
Year of Publication2020
AuthorsSimon, L., Verma, A.
Conference Name2020 IEEE European Symposium on Security and Privacy (EuroS P)
Date Publishedsep
Keywordsafl, AFL's configuration, beneficial compiler optimizations, compiler security, compositionality, concolic fuzzers, controlled compilation, coverage, coverage mesaures, coverage metrics, current coverage-based evaluation measures, fuzzing, fuzzing consistency, fuzzing strategy, fuzzy set theory, grey-box fuzzers, Intermediate Representation, LLVM, Metrics, open source projects, program compilers, program line, program testing, pubcrawl, qualitative coverage, Resiliency, rigorous evaluation methodology, Scalability, security of data, source code, standard compilers
AbstractWe observe that operations performed by standard compilers harm fuzzing because the optimizations and the Intermediate Representation (IR) lead to transformations that improve execution speed at the expense of fuzzing. To remedy this problem, we propose `controlled compilation', a set of techniques to automatically re-factor a program's source code and cherry pick beneficial compiler optimizations to improve fuzzing. We design, implement and evaluate controlled compilation by building a new toolchain with Clang/LLVM. We perform an evaluation on 10 open source projects and compare the results of AFL to state-of-the-art grey-box fuzzers and concolic fuzzers. We show that when programs are compiled with this new toolchain, AFL covers 30 % new code on average and finds 21 additional bugs in real world programs. Our study reveals that controlled compilation often covers more code and finds more bugs than state-of-the-art fuzzing techniques, without the need to write a fuzzer from scratch or resort to advanced techniques. We identify two main reasons to explain why. First, it has proven difficult for researchers to appropriately configure existing fuzzers such as AFL. To address this problem, we provide guidelines and new LLVM passes to help automate AFL's configuration. This will enable researchers to perform a fairer comparison with AFL. Second, we find that current coverage-based evaluation measures (e.g. the total number of visited lines, edges or BBs) are inadequate because they lose valuable information such as which parts of a program a fuzzer actually visits and how consistently it does so. Coverage is considered a useful metric to evaluate a fuzzer's performance and devise a fuzzing strategy. However, the lack of a standard methodology for evaluating coverage remains a problem. To address this, we propose a rigorous evaluation methodology based on `qualitative coverage'. Qualitative coverage uniquely identifies each program line to help understand which lines are commonly visited by different fuzzers vs. which lines are visited only by a particular fuzzer. Throughout our study, we show the benefits of this new evaluation methodology. For example we provide valuable insights into the consistency of fuzzers, i.e. their ability to cover the same code or find the same bug across multiple independent runs. Overall, our evaluation methodology based on qualitative coverage helps to understand if a fuzzer performs better, worse, or is complementary to another fuzzer. This helps security practitioners adjust their fuzzing strategies.
DOI10.1109/EuroSP48549.2020.00011
Citation Keysimon_improving_2020