Visible to the public BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection

TitleBinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection
Publication TypeConference Paper
Year of Publication2017
AuthorsHuang, He, Youssef, Amr M., Debbabi, Mourad
Conference NameProceedings of the 2017 ACM on Asia Conference on Computer and Communications Security
Date PublishedApril 2017
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4944-4
Keywordsbinary code reuse, binary code similarity comparison, bug search, Human Behavior, malware analysis, Metrics, patch analysis, privacy, pubcrawl, Resiliency
Abstract

Code reuse detection is a key technique in reverse engineering. However, existing source code similarity comparison techniques are not applicable to binary code. Moreover, compilers have made this problem even more difficult due to the fact that different assembly code and control flow structures can be generated by the compilers even when implementing the same functionality. To address this problem, we present a fuzzy matching approach to compare two functions. We first obtain an initial mapping between basic blocks by leveraging the concept of longest common subsequence on the basic block level and execution path level. We then extend the achieved mapping using neighborhood exploration. To make our approach applicable to large data sets, we designed an effective filtering process using Minhashing. Based on the proposed approach, we implemented a tool named BinSequence and conducted extensive experiments with it. Our results show that given a large assembly code repository with millions of functions, BinSequence is efficient and can attain high quality similarity ranking of assembly functions with an accuracy of above 90%. We also present several practical use cases including patch analysis, malware analysis and bug search.

URLhttps://dl.acm.org/doi/10.1145/3052973.3052974
DOI10.1145/3052973.3052974
Citation Keyhuang_binsequence:_2017