Visible to the public Affix: Micro-executing Binaries to Produce Static Analysis ModelsConflict Detection Enabled

Video: 

As a formal method, static program analysis is highly appealing: today's tools can reason about useful properties (e.g., freedom from memory safety violations) at a modest of precision even for large programs. However, this statement is true only for source programs; (arbitrary) machine code is much harder to analyze precisely. To deal with machine (aka binary) code, we have been developing a tool called Affix (Figure 1). It generates source-code models of binary programs which can be statically analyzed along with the source code that uses the binaries, thereby improving the results of the analysis (fewer false positives and negatives).

In this talk, we will describe how Affix works, and the steps we have taken to make its algorithm scale up. As a baseline, Affix uses micro execution, a technique for interpreting incomplete binary programs. Affix runs the API functions of a binary (e.g., based on its header file), setting input parameters to unknown. Whenever such an unknown is used, it is replaced by a type-correct but random value; e.g., if the unknown is a pointer, then dereferencing it generates memory (itself, unknown) that can be accessed. Affix performs a special kind of taint propagation on unknown values deriving from/to input/output parameters/returns, and from API functions of interest to the analysis, such as malloc, free, getenv, exec, etc. At the conclusion of micro execution, Affix analyzes the state of memory to derive flows of tainted values; from them, it generates a C-code model that approximates the observed behavior. This process works well; we have generated useful models for checking several interesting properties using Facebook's infer analyzer. We will demo Affix's process during the talk and show how the basic idea scales up.



Anwar Mamat is a senior lecturer at the Computer Science Department of the University of Maryland, and a part time software engineer at the Correct Computation. He is interested in cyber-physical systems and programming languages.

Contributor(s): 
Anwar Mamat
Ian Sweet
Michael Hicks