Visible to the public Identifying Encrypted Malware Traffic with Contextual Flow Data

TitleIdentifying Encrypted Malware Traffic with Contextual Flow Data
Publication TypeConference Paper
Year of Publication2016
AuthorsAnderson, Blake, McGrew, David
Conference NameProceedings of the 2016 ACM Workshop on Artificial Intelligence and Security
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4573-6
Keywordsartificial intelligence security, composability, Computational Intelligence, Encryption, machine learning, Malware, malware classification, Metrics, network monitoring, pubcrawl, Resiliency, Transport Layer Security, windows operating systems security
Abstract

Identifying threats contained within encrypted network traffic poses a unique set of challenges. It is important to monitor this traffic for threats and malware, but do so in a way that maintains the integrity of the encryption. Because pattern matching cannot operate on encrypted data, previous approaches have leveraged observable metadata gathered from the flow, e.g., the flow's packet lengths and inter-arrival times. In this work, we extend the current state-of-the-art by considering a data omnia approach. To this end, we develop supervised machine learning models that take advantage of a unique and diverse set of network flow data features. These data features include TLS handshake metadata, DNS contextual flows linked to the encrypted flow, and the HTTP headers of HTTP contextual flows from the same source IP address within a 5 minute window. We begin by exhibiting the differences between malicious and benign traffic's use of TLS, DNS, and HTTP on millions of unique flows. This study is used to design the feature sets that have the most discriminatory power. We then show that incorporating this contextual information into a supervised learning system significantly increases performance at a 0.00% false discovery rate for the problem of classifying encrypted, malicious flows. We further validate our false positive rate on an independent, real-world dataset.

URLhttp://doi.acm.org/10.1145/2996758.2996768
DOI10.1145/2996758.2996768
Citation Keyanderson_identifying_2016