A Learning-based Data Augmentation for Network Anomaly Detection

Submitted by aekwall on Mon, 03/29/2021 - 11:58am

Title	A Learning-based Data Augmentation for Network Anomaly Detection
Publication Type	Conference Paper
Year of Publication	2020
Authors	Olaimat, M. Al, Lee, D., Kim, Y., Kim, J., Kim, J.
Conference Name	2020 29th International Conference on Computer Communications and Networks (ICCCN)
Date Published	aug
Keywords	anomaly detection, attack instances, augments data, class imbalance problem, data augmentation, data handling, data instances, Data models, Divide-Augment-Combine, divide-augment-combine strategy, Gallium nitride, Generative Adversarial Learning, generative adversarial model, generative adversarial network, generative adversarial networks, Generators, high-quality data, learning (artificial intelligence), learning-based data augmentation, machine learning, network anomaly detection, network traffic, network traffic traces, neural nets, pattern classification, Predictive Metrics, pubcrawl, public network datasets, Resiliency, sampling methods, Scalability, security of data, statistical sampling, Support vector machines, synthetic instances
Abstract	While machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel "Divide-Augment-Combine" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.
DOI	10.1109/ICCCN49398.2020.9209598
Citation Key	olaimat_learning-based_2020

Groups:

Science of Security VO