Biblio
Continuous and adaptive learning is an effective learning approach when dealing with highly dynamic and changing scenarios, where concept drift often happens. In a continuous, stream or adaptive learning setup, new measurements arrive continuously and there are no boundaries for learning, meaning that the learning model has to decide how and when to (re)learn from these new data constantly. We address the problem of adaptive and continual learning for network security, building dynamic models to detect network attacks in real network traffic. The combination of fast and big network measurements data with the re-training paradigm of adaptive learning imposes complex challenges in terms of data processing speed, which we tackle by relying on big data platforms for parallel stream processing. We build and benchmark different adaptive learning models on top of a novel big data analytics platform for network traffic monitoring and analysis tasks, and show that high speed-up computations (as high as × 6) can be achieved by parallelizing off-the-shelf stream learning approaches.
Data Stream Machine Learning is rapidly gaining popularity within the network monitoring community as the big data produced by network devices and end-user terminals goes beyond the memory constraints of standard monitoring equipment. Critical network monitoring applications such as the detection of anomalies, network attacks and intrusions, require fast and continuous mechanisms for on-line analysis of data streams. In this paper we consider a stream-based machine learning approach for network security and anomaly detection, applying and evaluating multiple machine learning algorithms in the analysis of continuously evolving network data streams. The continuous evolution of the data stream analysis algorithms coming from the data stream mining domain, as well as the multiple evaluation approaches conceived for benchmarking such kind of algorithms makes it difficult to choose the appropriate machine learning model. Results of the different approaches may significantly differ and it is crucial to determine which approach reflects the algorithm performance the best. We therefore compare and analyze the results from the most recent evaluation approaches for sequential data on commonly used batch-based machine learning algorithms and their corresponding stream-based extensions, for the specific problem of on-line network security and anomaly detection. Similar to our previous findings when dealing with off-line machine learning approaches for network security and anomaly detection, our results suggest that adaptive random forests and stochastic gradient descent models are able to keep up with important concept drifts in the underlying network data streams, by keeping high accuracy with continuous re-training at concept drift detection times.