Biblio
Malware technology makes it difficult for malware analyst to detect same malware files with different obfuscation technique. In this paper we are trying to tackle that problem by analyzing the sequence of system call from an executable file. Malware files which actually are the same should have almost identical or at least a similar sequence of system calls. In this paper, we are going to create a model for each malware class consists of malwares from different families based on its sequence of system calls. Method/algorithm that's used in this paper is profile hidden markov model which is a very well-known tool in the biological informatics field for comparing DNA and protein sequences. Malware classes that we are going to build are trojan and worm class. Accuracy for these classes are pretty high, it's above 90% with also a high false positive rate around 37%.
Digital forensic investigators today are faced with numerous problems when recovering footprints of criminal activity that involve the use of computer systems. Investigators need the ability to recover evidence in a forensically sound manner, even when criminals actively work to alter the integrity, veracity, and provenance of data, applications and software that are used to support illicit activities. In many ways, operating systems (OS) can be strengthened from a technological viewpoint to support verifiable, accurate, and consistent recovery of system data when needed for forensic collection efforts. In this paper, we extend the ideas for forensic-friendly OS design by proposing the use of a practical form of computing on encrypted data (CED) and computing with encrypted functions (CEF) which builds upon prior work on component encryption (in circuits) and white-box cryptography (in software). We conduct experiments on sample programs to provide analysis of the approach based on security and efficiency, illustrating how component encryption can strengthen key OS functions and improve tamper-resistance to anti-forensic activities. We analyze the tradeoff space for use of the algorithm in a holistic approach that provides additional security and comparable properties to fully homomorphic encryption (FHE).
Although existing for decades, software tampering attack is still a main threat to systems, such as Android, and cyber physical systems. Many approaches have been proposed to thwart specific procedures of tampering, e.g., obfuscation and self-checksumming. However, none of them can achieve theoretically tamper-proof without the protection of hardware circuit. Rather than proposing new tricks against tampering attacks, we focus on impeding the replication of software tampering via program diversification, and thus pose a scalability barrier against the attacks. Our idea, namely N-version obfuscation (NVO), is to automatically generate and deliver same featured, but functionally nonequivalent software copies to different machines or users. In this paper, we investigate such an idea on Android platform. We carefully design a candidate NVO solution for networked apps, which leverages a Message Authentication Code (MAC) mechanism to generate the functionally nonequivalent diversities. Our evaluation result shows that the time required for breaking such a software system increases linearly with respect to the number of software versions. In this way, attackers would suffer great scalability issues, considering that an app can have millions of users. With minimal NVO costs, effective tamper-resistant security can therefore be established.
As a common practice in software development, program obfuscation aims at deterring reverse engineering and malicious attacks on released source or binary code. Owning ample obfuscation techniques, we have relatively little knowledge on how to most effectively use them. The biggest challenge lies in identifying the most useful combination of these techniques. We propose a unified framework to automatically generate and optimize obfuscation based on an obscurity language model and a Monte Carlo Markov Chain (MCMC) based search algorithm. We further instantiate it for JavaScript programs and developed the Closure* tool. Compared to the well-known Google Closure Compiler, Closure* outperforms its default setting by 26%. For programs which have already been well obfuscated, Closure* can still outperform by 22%.
Whether it is for conditional statement, constant, opaque predicate or equation obfuscation, Mixed Boolean Arithmetics (MBA) technique is a powerful tool providing concrete ways to achieve obfuscation. Recent papers ([22,1]) presented ways to mix such tools with permutation polynomials modulo 2n in order to make them more resilient to SMT solvers. However, because of limitations regarding the inversion of such permutations, the set of permutation polynomials presented suffer some restrictions. Such restrictions bring several methods of arithmetic simplification, decreasing their effectiveness at hiding information. In this work, we present general methods for permutation polynomials inversion. Those methods allow us to remove some of the restrictions presented in the literature, making simplification methods less effective. We discuss complexity and limits of these methods, and conclude that not only current simplification methods may not be as effective as we thought, but they are still many uses of polynomial permutations in obfuscation that are yet to be explored.
Obfuscation is a mechanism used to hinder reverse engineering of programs. To cope with the large number of obfuscated programs, especially malware, reverse engineers automate the process of deobfuscation i.e. extracting information from obfuscated programs. Deobfuscation techniques target specific obfuscation transformations, which requires reverse engineers to manually identify the transformations used by a program, in what is known as metadata recovery attack. In this paper, we present Oedipus, a Python framework that uses machine learning classifiers viz., decision trees and naive Bayes, to automate metadata recovery attacks against obfuscated programs. We evaluated Oedipus' performance using two datasets totaling 1960 unobfuscated C programs, which were used to generate 11.075 programs obfuscated using 30 configurations of 6 different obfuscation transformations. Our results empirically show the feasibility of using machine learning to implement the metadata recovery attacks with classification accuracies of 100% in some cases.
Motivated by applications in cryptography, we introduce and study the problem of distribution design. The goal of distribution design is to find a joint distribution on \$n\$ random variables that satisfies a given set of constraints on the marginal distributions. Each constraint can either require that two sequences of variables be identically distributed or, alternatively, that the two sequences have disjoint supports. We present several positive and negative results on the existence and efficiency of solutions for a given set of constraints. Distribution design can be seen as a strict generalization of several well-studied problems in cryptography. These include secret sharing, garbling schemes, and non-interactive protocols for secure multiparty computation. We further motivate the problem and our results by demonstrating their usefulness towards realizing non-interactive protocols for ad-hoc secure multiparty computation, in which any subset of the parties may choose to participate and the identity of the participants should remain hidden to the extent possible.
The rise of sensor-equipped smart phones has enabled a variety of classification based applications that provide personalized services based on user data extracted from sensor readings. However, malicious applications aggressively collect sensitive information from inherent user data without permissions. Furthermore, they can mine sensitive information from user data just in the classification process. These privacy threats raise serious privacy concerns. In this paper, we introduce two new privacy concerns which are inherent-data privacy and latent-data privacy. We propose a framework that enables a data-obfuscation mechanism to be developed easily. It preserves latent-data privacy while guaranteeing satisfactory service quality. The proposed framework preserves privacy against powerful adversaries who have knowledge of users' access pattern and the data-obfuscation mechanism. We validate our framework towards a real classification-orientated dataset. The experiment results confirm that our framework is superior to the basic obfuscation mechanism.
We present a unified framework for studying secure multiparty computation (MPC) with arbitrarily restricted interaction patterns such as a chain, a star, a directed tree, or a directed graph. Our study generalizes both standard MPC and recent models for MPC with specific restricted interaction patterns, such as those studied by Halevi et al. (Crypto 2011), Goldwasser et al. (Eurocrypt 2014), and Beimel et al. (Crypto 2014). Since restricted interaction patterns cannot always yield full security for MPC, we start by formalizing the notion of "best possible security" for any interaction pattern. We then obtain the following main results: Completeness theorem. We prove that the star interaction pattern is complete for the problem of MPC with general interaction patterns. Positive results. We present both information-theoretic and computationally secure protocols for computing arbitrary functions with general interaction patterns. We also present more efficient protocols for computing symmetric functions, both in the computational and in the information-theoretic setting. Our computationally secure protocols for general functions necessarily rely on indistinguishability obfuscation while the ones for computing symmetric functions make simple use of multilinear maps. Negative results. We show that, in many cases, the complexity of our information-theoretic protocols is essentially the best that can be achieved. All of our protocols rely on a correlated randomness setup, which is necessary in our setting (for computing general functions). In the computational case, we also present a generic procedure to make any correlated randomness setup reusable, in the common random string model. Although most of our information-theoretic protocols have exponential complexity, they may be practical for functions on small domains (e.g., f0; 1g20), where they are concretely faster than their computational counterparts.
We construct the first fully succinct garbling scheme for RAM programs, assuming the existence of indistinguishability obfuscation for circuits and one-way functions. That is, the size, space requirements, and runtime of the garbled program are the same as those of the input program, up to poly-logarithmic factors and a polynomial in the security parameter. The scheme can be used to construct indistinguishability obfuscators for RAM programs with comparable efficiency, at the price of requiring sub-exponential security of the underlying primitives. In particular, this opens the door to obfuscated computations that are sublinear in the length of their inputs. The scheme builds on the recent schemes of Koppula-Lewko-Waters and Canetti-Holmgren-Jain-Vaikuntanathan [STOC 15]. A key technical challenge here is how to combine the fixed-prefix technique of KLW, which was developed for deterministic programs, with randomized Oblivious RAM techniques. To overcome that, we develop a method for arguing about the indistinguishability of two obfuscated randomized programs that use correlated randomness. Along the way, we also define and construct garbling schemes that offer only partial protection. These may be of independent interest.
Anti-reverse engineering is one of the core technologies of software intellectual property protection, prevailing techniques of which are static and dynamic obfuscation. Static obfuscation can only prevent static analysis with code mutation done before execution by compressing, encrypting and obfuscating. Dynamic obfuscation can prevent both static and dynamic analysis, which changes code while being executed. Popular dynamic obfuscation techniques include self-modifying code and virtual machine protection. Despite the higher safety, dynamic obfuscation has its problems: 1) code appear in plain text remains a long time; 2) control flow is exposable; 3) time and space overheads are too big. This paper presents a binary protection scheme using dynamic fine-grained code hiding and obfuscation named dynFCHO. In this scheme, basic blocks to be protected are hidden in original code and will be restored while being executed. Code obfuscation is also implemented additionally to enhance safety. Experiments prove that dynFCHO can effectively resist static and dynamic analysis without destructing original software functions. It can be used on most binary programs compiled by standard compilers. This scheme can be widely used with the advantages of strong protection, light-weight implementation, and good extendibility.