SaTC: CORE: Small: Machine Learning for Effective Fuzz Testing

Submitted by Koushik Sen on Thu, 03/14/2019 - 9:24am

Project Details

Lead PI

Koushik Sen

Co-PIs

Dawn Song

Performance Period

Oct 01, 2018 - Sep 30, 2021

Institution(s)

University of California - Berkeley

Award Number

1817122

In recent years, fuzz testing has evolved as one of the most effective testing techniques for finding security vulnerabilities and correctness bugs in real-world software systems. It has been used successfully by major software companies for security testing and quality assurance. State-of-the-art fuzz testing tools have found numerous security vulnerabilities and bugs in widely used software such as Web browsers, network tools, image processors, popular system libraries, C compilers, and interpreters.

Fuzz testing works by generating random input data for a program under test. A key reason behind its huge popularity is that it has low computation overhead compared to other sophisticated techniques such as dynamic symbolic execution. While fuzz testing has been highly successful in practice, it has been mostly implemented in ad-hoc ways by incorporating a collection of hacks and best practices. As such, fuzz testing techniques usually generate many redundant test inputs and take several days to weeks to find bugs. For complex input formats, such as for random C program inputs for a C compiler, a huge amount of manual tuning is required to make fuzz testing generate valid test inputs. This project proposes to make fuzz testing smarter and more effective by applying machine learning with customizable testing objectives. The proposed techniques will use probabilistic machine learning models, such as n-grams, recurrent neural networks (RNN), recursive neural networks, or multi-armed bandits, to generate inputs from scratch or to mutate a set of seed inputs. The model will be trained in such a way that the inputs generated by it will maximize the custom testing objective. We expect that such a model will generate fewer redundant inputs and can be customized to user-provided testing objectives. This project aims to contribute to the development of reliable, secure, and trustworthy software. The tools and techniques developed in this project will make it easier for programmers to write correct and secure programs.

Koushik Sen