Build It Break It Fix It: Measuring Secure Development

Properties

Submitted by akarns on Tue, 01/26/2016 - 3:52pm. Contributors:

Presented as part of the 2016 HCSS conference.

ABSTRACT

Security failures plague our software infrastructure every day. Many specialists have proposed their own fixes. The programming languages community often asserts that if software was made from stronger stuff then these failures would occur less frequently. Security practitioners insist that developers must be trained in security and that security must be built in from day one. Static analysis companies say their tools would identify the bugs before they were pushed to production. Security oriented library authors say their libraries are too simple for developers to mis-use and their use would help the security of software.

Can we measure the security impact that programming languages and developer practices have? We believe that we can, using a contest that we have developed: Build It Break It Fix It. Our hope is that this contest provides a source of data by which we can study and understand the relationship between security critical failures of software, and the manner in which that software was developed.

The format of the contest differs from past capture the flag and programming competitions. Our contest takes place over three phases. The first phase, Build It, has the contestants create software to a specification that we provide. The software may be in any programming language as long as it compiles on a specific Linux virtual machine. The specification defines correct behavior of the system as well as a basic threat model and security invariants that the specification should hold. We provide automated unit testing of the applications. We assign a score to each implementation based on performance properties of the application,
for example execution time and the size of data generated.

In the second phase, Break It, contestants are given the source code to every other contestants implementation and told to find security bugs. These bugs are either correctness, confidentiality, or integrity bugs in the language of the original specication. In the final phase, Fix It, teams may respond to bug reports against their application by identifying that different reports all refer to the same bug in their system. At the end of the three phases, we have winners in two categories: building, and breaking.

We have run this contest multiple times with multiple specifications and in our talk will share our initial analysis of the data, our experiences running the experiment, and our plans for the future. Our contest runs have included both independent contests with participants from the open Internet as well as contests held as a capstone exercise as part of a Coursera MOOC on software security.

We believe that the corpus of applications and specifications would be of interest for the application security community. This corpus represents the efforts of programmers with different levels of education, experience, and exposure to security topics to create secure software in different programming languages.

We can compare and contrast these software artifacts, as well as use these artifacts to test the effectiveness of bug finding systems and methodologies. We would also be interested in feedback on our experimental design and suggestions for future problem specifications to run as contests.

BIO
Andrew Reuf is a PhD student at the University of Maryland (UMD), College Park. His research focuses on programming languages and computer security. Before starting his graduate work, Andrew worked for ten years as a security researcher and developer of low level and operating system software.