Visible to the public CT-ISG: The Origin of the Code: Automated Identification of Common Characteristics in MalwareConflict Detection Enabled

Project Details

Performance Period

Sep 01, 2008 - Jan 31, 2012

Institution(s)

North Carolina State University

Award Number


Outcomes Report URL


Software is a common target of attacks on the current computing / communications infrastructure. Software continues to be vulnerable to attacks that exploit obscure or misunderstood language and program features. Detection of these software exploits (also called "malware") will therefore be needed for the forseeable future as one part of an effective defense. Virus checkers detect many known exploits, and are now widely used, but attackers have adapted by obfuscating and mutating their code to evade virus checkers.

Such techniques make precise identification of malware extremely difficult. This project will use key characteristics of attack code for identification purposes. Important features of this approach include: advanced disassembly techniques; translation of code into an intermediate form more amenable to analysis, and more resistant to obfuscation; static reconstruction of program control flow and data flow; and, extraction of properties of interest, followed by analysis of these properties. The properties of interest include the characteristic behaviors of encryption and compression, and the system calls executed by the code. Rather than relying on exact matching of these properties for malware identification, approximate matching will be used. Static analysis will be the focus, to avoid the performance penalties of dynamic execution monitoring. The application of data mining to identify important malware features, and construct high-level patterns or signatures in a completely automated way, will also be investigated. The method will additionally help identify malware relationships, with applications to forensics, recovery of attack strategies, and identification of new classes of attacks (including zero-day attacks).

The method will resist the introduction of noise, or targeted evasion by malware writers, and will provide much better protection against polymorphic and metamorphic exploit code, and new attack variations. A database of patterns / characteristics for known software exploits will be maintained and made public. Educational materials about malware detection will be developed and disseminated, and training of female researchers will continue to be a priority.