Visible to the public Deep in the Dark - Deep Learning-Based Malware Traffic Detection Without Expert Knowledge

TitleDeep in the Dark - Deep Learning-Based Malware Traffic Detection Without Expert Knowledge
Publication TypeConference Paper
Year of Publication2019
AuthorsMar\'ın, Gonzalo, Casas, Pedro, Capdehourat, Germán
Conference Name2019 IEEE Security and Privacy Workshops (SPW)
Date Publishedmay
ISBN Number978-1-7281-3508-3
KeywordsAnalytical models, classification, Computer architecture, computer network security, Data models, Deep Learning, deep learning models, deep learning-based malware traffic detection, expert handcrafted features, expert systems, Expert Systems and Privacy, feature extraction, Human Behavior, human factors, invasive software, learning (artificial intelligence), machine learning-based systems, Malware, malware detection, malware network traffic, network security applications, network traffic, networking attacks, pattern classification, privacy, pubcrawl, Raw Measurements, raw-traffic feature representations, robust network security systems, Scalability, telecommunication traffic, Training
Abstract

With the ever-growing occurrence of networking attacks, robust network security systems are essential to prevent and mitigate their harming effects. In recent years, machine learning-based systems have gain popularity for network security applications, usually considering the application of shallow models, where a set of expert handcrafted features are needed to pre-process the data before training. The main problem with this approach is that handcrafted features can fail to perform well given different kinds of scenarios and problems. Deep Learning models can solve this kind of issues using their ability to learn feature representations from input raw or basic, non-processed data. In this paper we explore the power of deep learning models on the specific problem of detection and classification of malware network traffic, using different representations for the input data. As a major advantage as compared to the state of the art, we consider raw measurements coming directly from the stream of monitored bytes as the input to the proposed models, and evaluate different raw-traffic feature representations, including packet and flow-level ones. Our results suggest that deep learning models can better capture the underlying statistics of malicious traffic as compared to classical, shallow-like models, even while operating in the dark, i.e., without any sort of expert handcrafted inputs.

URLhttps://ieeexplore.ieee.org/document/8844609
DOI10.1109/SPW.2019.00019
Citation Keymarin_deep_2019