Title | Scalable Framework for Accurate Binary Code Comparison |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Aslanyan, H., Avetisyan, A., Arutunian, M., Keropyan, G., Kurmangaleev, S., Vardanyan, V. |
Conference Name | 2017 Ivannikov ISPRAS Open Conference (ISPRAS) |
Keywords | accurate binary code comparison, Algorithm design and analysis, backward slicing, binary code clone detection, Binary code comparison, Binary codes, binary files, Binnavi platform, calculated heuristics, call graph, CG, Cloning, comparison algorithm, Computer bugs, empiric results heuristic method, exact matches, extensive binaries, forward slicing, functions matching process parallelization, graph theory, heavily modified functions, high matching quality, Human Behavior, IdaPro disassembler, invasive software, Libraries, main stages, malware analysis, matched vertices, matching process, maximum common subgraph, Metrics, old versions, PDG, practical applications, privacy, program debugging, program dependence graph, program diagnostics, Program slicing, programmatic changes, pubcrawl, real world libraries, resilience, Resiliency, scalable framework, Semantics, slightly modified functions, static analysis, statically linked libraries, target program, Tools, unchanged modified functions, well-known bugs prevention |
Abstract | Comparison of two binary files has many practical applications: the ability to detect programmatic changes between two versions, the ability to find old versions of statically linked libraries to prevent the use of well-known bugs, malware analysis, etc. In this article, a framework for comparison of binary files is presented. Framework uses IdaPro [1] disassembler and Binnavi [2] platform to recover structure of the target program and represent it as a call graph (CG). A program dependence graph (PDG) corresponds to each vertex of the CG. The proposed comparison algorithm consists of two main stages. At the first stage, several heuristics are applied to find the exact matches. Two functions are matched if at least one of the calculated heuristics is the same and unique in both binaries. At the second stage, backward and forward slicing is applied on matched vertices of CG to find further matches. According to empiric results heuristic method is effective and has high matching quality for unchanged or slightly modified functions. As a contradiction, to match heavily modified functions, binary code clone detection is used and it is based on finding maximum common subgraph for pair of PDGs. To achieve high performance on extensive binaries, the whole matching process is parallelized. The framework is tested on the number of real world libraries, such as python, openssh, openssl, libxml2, rsync, php, etc. Results show that in most cases more than 95% functions are truly matched. The tool is scalable due to parallelization of functions matching process and generation of PDGs and CGs. |
DOI | 10.1109/ISPRAS.2017.00013 |
Citation Key | aslanyan_scalable_2017 |