Biblio
Complex software is built by composing components implementing largely independent blocks of functionality. However, once the sources are compiled into an executable, that modularity is lost. This is unfortunate for code recipients, for whom knowing the components has many potential benefits, such as improved program understanding for reverse-engineering, identifying shared code across different programs, binary code reuse, and authorship attribution. A novel approach for decomposing such source-free program executables into components is here proposed. Given an executable, the approach first statically builds a decomposition graph, where nodes are functions and edges capture three types of relationships: code locality, data references, and function calls. It then applies a graph-theoretic approach to partition the functions into disjoint components. A prototype implementation, BCD, demonstrates the approach's efficacy: Evaluation of BCD with 25 C++ binary programs to recover the methods belonging to each class achieves high precision and recall scores for these tested programs.
Modern OS kernels including Windows, Linux, and Mac OS all have adopted kernel Address Space Layout Randomization (ASLR), which shifts the base address of kernel code and data into different locations in different runs. Consequently, when performing introspection or forensic analysis of kernel memory, we cannot use any pre-determined addresses to interpret the kernel events. Instead, we must derandomize the address space layout and use the new addresses. However, few efforts have been made to derandomize the kernel address space and yet there are many questions left such as which approach is more efficient and robust. Therefore, we present the first systematic study of how to derandomize a kernel when given a memory snapshot of a running kernel instance. Unlike the derandomization approaches used in traditional memory exploits in which only remote access is available, with introspection and forensics applications, we can use all the information available in kernel memory to generate signatures and derandomize the ASLR. In other words, there exists a large volume of solutions for this problem. As such, in this paper we examine a number of typical approaches to generate strong signatures from both kernel code and data based on the insight of how kernel code and data is updated, and compare them from efficiency (in terms of simplicity, speed etc.) and robustness (e.g., whether the approach is hard to be evaded or forged) perspective. In particular, we have designed four approaches including brute-force code scanning, patched code signature generation, unpatched code signature generation, and read-only pointer based approach, according to the intrinsic behavior of kernel code and data with respect to kernel ASLR. We have gained encouraging results for each of these approaches and the corresponding experimental results are reported in this paper.