Visible to the public A Semi-Supervised Learning Scheme to Detect Unknown DGA Domain Names Based on Graph Analysis

TitleA Semi-Supervised Learning Scheme to Detect Unknown DGA Domain Names Based on Graph Analysis
Publication TypeConference Paper
Year of Publication2020
AuthorsYan, Fan, Liu, Jia, Gu, Liang, Chen, Zelong
Conference Name2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)
Date Publisheddec
KeywordsBotnet, DGA, feature extraction, graph analysis, graph theory, Human Behavior, machine learning, Malware, malware analysis, Metrics, Prediction algorithms, privacy, pubcrawl, resilience, Resiliency, reverse engineering, security, Semisupervised learning, supervised learning
AbstractA large amount of malware families use the domain generation algorithms (DGA) to randomly generate a large amount of domain names. It is a good way to bypass conventional blacklists of domain names, because we cannot predict which of the randomly generated domain names are selected for command and control (C&C) communications. An effective approach for detecting known DGA families is to investigate the malware with reverse engineering to find the adopted generation algorithms. As reverse engineering cannot handle the variants of DGA families, some researches leverage supervised learning to find new variants. However, the explainability of supervised learning is low and cannot find previously unseen DGA families. In this paper, we propose a graph-based semi-supervised learning scheme to track the evolution of known DGA families and find previously unseen DGA families. With a domain relation graph, we can clearly figure out how new variants relate to known DGA domain names, which induces better explainability. We deployed the proposed scheme on real network scenarios and show that the proposed scheme can not only comprehensively and precisely find known DGA families, but also can find new DGA families which have not seen before.
DOI10.1109/TrustCom50675.2020.00218
Citation Keyyan_semi-supervised_2020