Large-Scale Identification of Malicious Singleton Files
Title | Large-Scale Identification of Malicious Singleton Files |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Li, Bo, Roundy, Kevin, Gates, Chris, Vorobeychik, Yevgeniy |
Conference Name | Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4523-1 |
Keywords | composability, cyber physical systems, False Data Detection, Human Behavior, large-scale malware detection, machine learning, pubcrawl, resilience, Resiliency, security, singleton files |
Abstract | We study a dataset of billions of program binary files that appeared on 100 million computers over the course of 12 months, discovering that 94% of these files were present on a single machine. Though malware polymorphism is one cause for the large number of singleton files, additional factors also contribute to polymorphism, given that the ratio of benign to malicious singleton files is 80:1. The huge number of benign singletons makes it challenging to reliably identify the minority of malicious singletons. We present a large-scale study of the properties, characteristics, and distribution of benign and malicious singleton files. We leverage the insights from this study to build a classifier based purely on static features to identify 92% of the remaining malicious singletons at a 1.4% percent false positive rate, despite heavy use of obfuscation and packing techniques by most malicious singleton files that we make no attempt to de-obfuscate. Finally, we demonstrate robustness of our classifier to important classes of automated evasion attacks. |
URL | https://dl.acm.org/citation.cfm?doid=3029806.3029815 |
DOI | 10.1145/3029806.3029815 |
Citation Key | li_large-scale_2017 |