DeepDGA: Adversarially-Tuned Domain Generation and Detection

Submitted by grigby1 on Mon, 11/20/2017 - 12:20pm

Title	DeepDGA: Adversarially-Tuned Domain Generation and Detection
Publication Type	Conference Paper
Year of Publication	2016
Authors	Anderson, Hyrum S., Woodbridge, Jonathan, Filar, Bobby
Conference Name	Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-4573-6
Keywords	artificial neural network, Artificial neural networks, Collaboration, Deep Learning, domain generation algorithms, generative adversarial networks, governance, Government, machine learning, policy, policy-based governance, pubcrawl, Resiliency
Abstract	Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been successful on fairly simplistic DGAs, many of which produce names of fixed length. However, models trained on limited datasets are somewhat blind to new DGA variants. In this paper, we leverage the concept of generative adversarial networks to construct a deep learning based DGA that is designed to intentionally bypass a deep learning based detector. In a series of adversarial rounds, the generator learns to generate domain names that are increasingly more difficult to detect. In turn, a detector model updates its parameters to compensate for the adversarially generated domains. We test the hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs. We detail solutions to several challenges in training this character-based generative adversarial network. In particular, our deep learning architecture begins as a domain name auto-encoder (encoder + decoder) trained on domains in the Alexa one million. Then the encoder and decoder are reassembled competitively in a generative adversarial network (detector + generator), with novel neural architectures and training strategies to improve convergence.
URL	http://doi.acm.org/10.1145/2996758.2996767
DOI	10.1145/2996758.2996767
Citation Key	anderson_deepdga:_2016

Groups:

Science of Security VO