Visible to the public A Comparison of Performance Metrics with Severely Imbalanced Network Security Big Data

TitleA Comparison of Performance Metrics with Severely Imbalanced Network Security Big Data
Publication TypeConference Paper
Year of Publication2019
AuthorsHasanin, Tawfiq, Khoshgoftaar, Taghi M., Leevy, Joffrey L.
Conference Name2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)
Keywordsapache spark, Apache Spark framework, area under the receiver operating characteristic curve, Big Data, Big Data analytics, big data security metrics, Cluster computing, computer network security, Data analysis, data mining, Geometric Mean, imbalanced data, learning (artificial intelligence), machine learning, majority class, Measurement, Metrics, metrics testing, minority classes, pattern classification, Performance Metrics, Precision-Recall Curve, Predictive models, pubcrawl, Radio frequency, resilience, Resiliency, sampling, sampling distribution ratio, sampling methods, Scalability, severe class imbalance, severely imbalanced network security big data, Sparks, testing dataset roles, Training
Abstract

Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.

DOI10.1109/IRI.2019.00026
Citation Keyhasanin_comparison_2019