Biblio
Filters: Keyword is current coverage-based evaluation measures [Clear All Filters]
Improving Fuzzing through Controlled Compilation. 2020 IEEE European Symposium on Security and Privacy (EuroS P). :34–52.
.
2020. We observe that operations performed by standard compilers harm fuzzing because the optimizations and the Intermediate Representation (IR) lead to transformations that improve execution speed at the expense of fuzzing. To remedy this problem, we propose `controlled compilation', a set of techniques to automatically re-factor a program's source code and cherry pick beneficial compiler optimizations to improve fuzzing. We design, implement and evaluate controlled compilation by building a new toolchain with Clang/LLVM. We perform an evaluation on 10 open source projects and compare the results of AFL to state-of-the-art grey-box fuzzers and concolic fuzzers. We show that when programs are compiled with this new toolchain, AFL covers 30 % new code on average and finds 21 additional bugs in real world programs. Our study reveals that controlled compilation often covers more code and finds more bugs than state-of-the-art fuzzing techniques, without the need to write a fuzzer from scratch or resort to advanced techniques. We identify two main reasons to explain why. First, it has proven difficult for researchers to appropriately configure existing fuzzers such as AFL. To address this problem, we provide guidelines and new LLVM passes to help automate AFL's configuration. This will enable researchers to perform a fairer comparison with AFL. Second, we find that current coverage-based evaluation measures (e.g. the total number of visited lines, edges or BBs) are inadequate because they lose valuable information such as which parts of a program a fuzzer actually visits and how consistently it does so. Coverage is considered a useful metric to evaluate a fuzzer's performance and devise a fuzzing strategy. However, the lack of a standard methodology for evaluating coverage remains a problem. To address this, we propose a rigorous evaluation methodology based on `qualitative coverage'. Qualitative coverage uniquely identifies each program line to help understand which lines are commonly visited by different fuzzers vs. which lines are visited only by a particular fuzzer. Throughout our study, we show the benefits of this new evaluation methodology. For example we provide valuable insights into the consistency of fuzzers, i.e. their ability to cover the same code or find the same bug across multiple independent runs. Overall, our evaluation methodology based on qualitative coverage helps to understand if a fuzzer performs better, worse, or is complementary to another fuzzer. This helps security practitioners adjust their fuzzing strategies.