Biblio
This paper argues about a new conceptual modeling language for the White-Box (WB) security analysis. In the WB security domain, an attacker may have access to the inner structure of an application or even the entire binary code. It becomes pretty easy for attackers to inspect, reverse engineer, and tamper the application with the information they steal. The basis of this paper is the 14 patterns developed by a leading provider of software protection technologies and solutions. We provide a part of a new modeling language named i-WBS (White-Box Security) to describe problems of WB security better. The essence of White-Box security problem is code security. We made the new modeling language focus on code more than ever before. In this way, developers who are not security experts can easily understand what they need to really protect.
Binary embedding is an effective way for nearest neighbor (NN) search as binary code is storage efficient and fast to compute. It tries to convert real-value signatures into binary codes while preserving similarity of the original data. However, it greatly decreases the discriminability of original signatures due to the huge loss of information. In this paper, we propose a novel method double-bit quantization and weighting (DBQW) to solve the problem by mapping each dimension to double-bit binary code and assigning different weights according to their spatial relationship. The proposed method is applicable to a wide variety of embedding techniques, such as SH, PCA-ITQ and PCA-RR. Experimental comparisons on two datasets show that DBQW for NN search can achieve remarkable improvements in query accuracy compared to original binary embedding methods.
The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph-matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embeddings for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art's embedding generation time by 3 to 4 orders of magnitude and reduce the required training time from more than 1 week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art, i.e., Genius. Our research showcases a successful application of deep learning on computer security problems.
Most security software tools try to detect malicious components by cryptographic hashes, signatures or based on their behavior. The former, is a widely adopted approach based on Integrity Measurement Architecture (IMA) enabling appraisal and attestation of system components. The latter, however, may induce a very long time until misbehavior of a component leads to a successful detection. Another approach is a Dynamic Runtime Attestation (DRA) based on the comparison of binary code loaded in the memory and well-known references. Since DRA is a complex approach, involving multiple related components and often complex attestation strategies, a flexible and extensible architecture is needed. In a cooperation project an architecture was designed and a Proof of Concept (PoC) successfully developed and evaluated. To achieve needed flexibility and extensibility, the implementation facilitates central components providing attestation strategies (guidelines). These guidelines define and implement the necessary steps for all relevant attestation operations, i.e. measurement, reference generation and verification.