Deep in the Dark - Deep Learning-Based Malware Traffic Detection Without Expert Knowledge

Submitted by grigby1 on Thu, 01/02/2020 - 2:41pm

Title	Deep in the Dark - Deep Learning-Based Malware Traffic Detection Without Expert Knowledge
Publication Type	Conference Paper
Year of Publication	2019
Authors	Mar\'ın, Gonzalo, Casas, Pedro, Capdehourat, Germán
Conference Name	2019 IEEE Security and Privacy Workshops (SPW)
Date Published	may
ISBN Number	978-1-7281-3508-3
Keywords	Analytical models, classification, Computer architecture, computer network security, Data models, Deep Learning, deep learning models, deep learning-based malware traffic detection, expert handcrafted features, expert systems, Expert Systems and Privacy, feature extraction, Human Behavior, human factors, invasive software, learning (artificial intelligence), machine learning-based systems, Malware, malware detection, malware network traffic, network security applications, network traffic, networking attacks, pattern classification, privacy, pubcrawl, Raw Measurements, raw-traffic feature representations, robust network security systems, Scalability, telecommunication traffic, Training
Abstract	With the ever-growing occurrence of networking attacks, robust network security systems are essential to prevent and mitigate their harming effects. In recent years, machine learning-based systems have gain popularity for network security applications, usually considering the application of shallow models, where a set of expert handcrafted features are needed to pre-process the data before training. The main problem with this approach is that handcrafted features can fail to perform well given different kinds of scenarios and problems. Deep Learning models can solve this kind of issues using their ability to learn feature representations from input raw or basic, non-processed data. In this paper we explore the power of deep learning models on the specific problem of detection and classification of malware network traffic, using different representations for the input data. As a major advantage as compared to the state of the art, we consider raw measurements coming directly from the stream of monitored bytes as the input to the proposed models, and evaluate different raw-traffic feature representations, including packet and flow-level ones. Our results suggest that deep learning models can better capture the underlying statistics of malicious traffic as compared to classical, shallow-like models, even while operating in the dark, i.e., without any sort of expert handcrafted inputs.
URL	https://ieeexplore.ieee.org/document/8844609
DOI	10.1109/SPW.2019.00019
Citation Key	marin_deep_2019

Groups:

Science of Security VO