Identifying Encrypted Malware Traffic with Contextual Flow Data
Title | Identifying Encrypted Malware Traffic with Contextual Flow Data |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Anderson, Blake, McGrew, David |
Conference Name | Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4573-6 |
Keywords | artificial intelligence security, composability, Computational Intelligence, Encryption, machine learning, Malware, malware classification, Metrics, network monitoring, pubcrawl, Resiliency, Transport Layer Security, windows operating systems security |
Abstract | Identifying threats contained within encrypted network traffic poses a unique set of challenges. It is important to monitor this traffic for threats and malware, but do so in a way that maintains the integrity of the encryption. Because pattern matching cannot operate on encrypted data, previous approaches have leveraged observable metadata gathered from the flow, e.g., the flow's packet lengths and inter-arrival times. In this work, we extend the current state-of-the-art by considering a data omnia approach. To this end, we develop supervised machine learning models that take advantage of a unique and diverse set of network flow data features. These data features include TLS handshake metadata, DNS contextual flows linked to the encrypted flow, and the HTTP headers of HTTP contextual flows from the same source IP address within a 5 minute window. We begin by exhibiting the differences between malicious and benign traffic's use of TLS, DNS, and HTTP on millions of unique flows. This study is used to design the feature sets that have the most discriminatory power. We then show that incorporating this contextual information into a supervised learning system significantly increases performance at a 0.00% false discovery rate for the problem of classifying encrypted, malicious flows. We further validate our false positive rate on an independent, real-world dataset. |
URL | http://doi.acm.org/10.1145/2996758.2996768 |
DOI | 10.1145/2996758.2996768 |
Citation Key | anderson_identifying_2016 |