Threshold Based Optimization of Performance Metrics with Severely Imbalanced Big Security Data
Title | Threshold Based Optimization of Performance Metrics with Severely Imbalanced Big Security Data |
Publication Type | Conference Paper |
Year of Publication | 2019 |
Authors | Calvert, Chad L., Khoshgoftaar, Taghi M. |
Conference Name | 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) |
Keywords | area under the receiver operating characteristic curve, AUC, Big Data, big data security metrics, C4.5N decision tree, class imbalance, classification threshold, classifier performance, classifier predictive models, cyber security domain, Decision trees, imbalanced big dataset, learning (artificial intelligence), machine learning techniques, Metrics, optimisation, pattern classification, Performance Metrics, predictive security metrics, pubcrawl, security of data, severely imbalanced big security data, severely imbalanced slow HTTP DoS attack data, Slow HTTP POST, threshold based optimization |
Abstract | Proper evaluation of classifier predictive models requires the selection of appropriate metrics to gauge the effectiveness of a model's performance. The Area Under the Receiver Operating Characteristic Curve (AUC) has become the de facto standard metric for evaluating this classifier performance. However, recent studies have suggested that AUC is not necessarily the best metric for all types of datasets, especially those in which there exists a high or severe level of class imbalance. There is a need to assess which specific metrics are most beneficial to evaluate the performance of highly imbalanced big data. In this work, we evaluate the performance of eight machine learning techniques on a severely imbalanced big dataset pertaining to the cyber security domain. We analyze the behavior of six different metrics to determine which provides the best representation of a model's predictive performance. We also evaluate the impact that adjusting the classification threshold has on our metrics. Our results find that the C4.5N decision tree is the optimal learner when evaluating all presented metrics for severely imbalanced Slow HTTP DoS attack data. Based on our results, we propose that the use of AUC alone as a primary metric for evaluating highly imbalanced big data may be ineffective, and the evaluation of metrics such as F-measure and Geometric mean can offer substantial insight into the true performance of a given model. |
DOI | 10.1109/ICTAI.2019.00184 |
Citation Key | calvert_threshold_2019 |
- learning (artificial intelligence)
- threshold based optimization
- Slow HTTP POST
- severely imbalanced slow HTTP DoS attack data
- severely imbalanced big security data
- security of data
- pubcrawl
- predictive security metrics
- Performance Metrics
- pattern classification
- optimisation
- Metrics
- machine learning techniques
- big data security metrics
- imbalanced big dataset
- Decision trees
- cyber security domain
- classifier predictive models
- classifier performance
- classification threshold
- class imbalance
- C4.5N decision tree
- Big Data
- AUC
- area under the receiver operating characteristic curve