Visible to the public DeepDGA: Adversarially-Tuned Domain Generation and Detection

TitleDeepDGA: Adversarially-Tuned Domain Generation and Detection
Publication TypeConference Paper
Year of Publication2016
AuthorsAnderson, Hyrum S., Woodbridge, Jonathan, Filar, Bobby
Conference NameProceedings of the 2016 ACM Workshop on Artificial Intelligence and Security
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4573-6
Keywordsartificial neural network, Artificial neural networks, Collaboration, Deep Learning, domain generation algorithms, generative adversarial networks, governance, Government, machine learning, policy, policy-based governance, pubcrawl, Resiliency
Abstract

Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been successful on fairly simplistic DGAs, many of which produce names of fixed length. However, models trained on limited datasets are somewhat blind to new DGA variants. In this paper, we leverage the concept of generative adversarial networks to construct a deep learning based DGA that is designed to intentionally bypass a deep learning based detector. In a series of adversarial rounds, the generator learns to generate domain names that are increasingly more difficult to detect. In turn, a detector model updates its parameters to compensate for the adversarially generated domains. We test the hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs. We detail solutions to several challenges in training this character-based generative adversarial network. In particular, our deep learning architecture begins as a domain name auto-encoder (encoder + decoder) trained on domains in the Alexa one million. Then the encoder and decoder are reassembled competitively in a generative adversarial network (detector + generator), with novel neural architectures and training strategies to improve convergence.

URLhttp://doi.acm.org/10.1145/2996758.2996767
DOI10.1145/2996758.2996767
Citation Keyanderson_deepdga:_2016