Biblio
With the ever-growing occurrence of networking attacks, robust network security systems are essential to prevent and mitigate their harming effects. In recent years, machine learning-based systems have gain popularity for network security applications, usually considering the application of shallow models, where a set of expert handcrafted features are needed to pre-process the data before training. The main problem with this approach is that handcrafted features can fail to perform well given different kinds of scenarios and problems. Deep Learning models can solve this kind of issues using their ability to learn feature representations from input raw or basic, non-processed data. In this paper we explore the power of deep learning models on the specific problem of detection and classification of malware network traffic, using different representations for the input data. As a major advantage as compared to the state of the art, we consider raw measurements coming directly from the stream of monitored bytes as the input to the proposed models, and evaluate different raw-traffic feature representations, including packet and flow-level ones. Our results suggest that deep learning models can better capture the underlying statistics of malicious traffic as compared to classical, shallow-like models, even while operating in the dark, i.e., without any sort of expert handcrafted inputs.
Continuous and adaptive learning is an effective learning approach when dealing with highly dynamic and changing scenarios, where concept drift often happens. In a continuous, stream or adaptive learning setup, new measurements arrive continuously and there are no boundaries for learning, meaning that the learning model has to decide how and when to (re)learn from these new data constantly. We address the problem of adaptive and continual learning for network security, building dynamic models to detect network attacks in real network traffic. The combination of fast and big network measurements data with the re-training paradigm of adaptive learning imposes complex challenges in terms of data processing speed, which we tackle by relying on big data platforms for parallel stream processing. We build and benchmark different adaptive learning models on top of a novel big data analytics platform for network traffic monitoring and analysis tasks, and show that high speed-up computations (as high as × 6) can be achieved by parallelizing off-the-shelf stream learning approaches.
Data Stream Machine Learning is rapidly gaining popularity within the network monitoring community as the big data produced by network devices and end-user terminals goes beyond the memory constraints of standard monitoring equipment. Critical network monitoring applications such as the detection of anomalies, network attacks and intrusions, require fast and continuous mechanisms for on-line analysis of data streams. In this paper we consider a stream-based machine learning approach for network security and anomaly detection, applying and evaluating multiple machine learning algorithms in the analysis of continuously evolving network data streams. The continuous evolution of the data stream analysis algorithms coming from the data stream mining domain, as well as the multiple evaluation approaches conceived for benchmarking such kind of algorithms makes it difficult to choose the appropriate machine learning model. Results of the different approaches may significantly differ and it is crucial to determine which approach reflects the algorithm performance the best. We therefore compare and analyze the results from the most recent evaluation approaches for sequential data on commonly used batch-based machine learning algorithms and their corresponding stream-based extensions, for the specific problem of on-line network security and anomaly detection. Similar to our previous findings when dealing with off-line machine learning approaches for network security and anomaly detection, our results suggest that adaptive random forests and stochastic gradient descent models are able to keep up with important concept drifts in the underlying network data streams, by keeping high accuracy with continuous re-training at concept drift detection times.