Visible to the public Biblio

Filters: Keyword is decompilation  [Clear All Filters]
2020-12-11
Slawinski, M., Wortman, A..  2019.  Applications of Graph Integration to Function Comparison and Malware Classification. 2019 4th International Conference on System Reliability and Safety (ICSRS). :16—24.

We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.

2017-03-07
Nanda, Mangala Gowri, Arun-Kumar, S..  2016.  Decompiling Boolean Expressions from Java™ Bytecode. Proceedings of the 9th India Software Engineering Conference. :59–69.

Java bytecode obfuscates the original structure of a Java expression in the source code. So a simple expression such as (c1 {\textbackslash}textbar{\textbackslash}textbar c2) or (c1 && c2) may be captured in the bytecode in 4 different ways (as shown in the paper). And correspondingly, when we reconvert the bytecode back into Java source code, there are four different ways this may happen. Further, although gotos are not permitted in the Java source code, the bytecode is full of gotos. If you were to blindly convert the bytecode into Java source code, then you would replace a goto by a labeled break. A labeled break has the advantage that it only allows you to break out of a block structure and (unlike a setjump) does not permit you to jump arbitrarily into a block structure. So while the data structures used in the regenerated Java source code are still relatively "clean" arbitrary usage of labeled breaks makes for unreadable code (as we show in the paper). And this can be a point of concern, since decompilation is generally related to debugging code. Instead of dumping arbitrary labeled breaks, we try to reconstruct the original expression, in terms of && and {\textbackslash}textbar{\textbackslash}textbar clauses as well as ternary operators "?:" (c0 ? c1 : c2); Thus our goal is quite simply to regenerate, without using goto or labeled breaks, the expressions as close to the original as possible (it is not possible to guarantee an exact match). In this paper we explain what is the state of the art in Java decompilers for decoding complex expressions. Then we will present our solution. We have implemented the algorithms described here in this paper and give you our experience with it.