Title | Tackling Class Imbalance in Cyber Security Datasets |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Wheelus, C., Bou-Harb, E., Zhu, X. |
Conference Name | 2018 IEEE International Conference on Information Reuse and Integration (IRI) |
Keywords | class imbalance, class imbalance problem, compositionality, computer security, cyber security, cyber security datasets, cyber-attacks, information infrastructure, Information Reuse and Security, learning (artificial intelligence), machine learning, machine learning algorithms, old attacks, outdated datasets, Predictive models, pubcrawl, Resiliency, SANTA dataset, security of data, Size measurement, Time measurement, Training, UNSW-NB15, Velocity measurement |
Abstract | It is clear that cyber-attacks are a danger that must be addressed with great resolve, as they threaten the information infrastructure upon which we all depend. Many studies have been published expressing varying levels of success with machine learning approaches to combating cyber-attacks, but many modern studies still focus on training and evaluating with very outdated datasets containing old attacks that are no longer a threat, and also lack data on new attacks. Recent datasets like UNSW-NB15 and SANTA have been produced to address this problem. Even so, these modern datasets suffer from class imbalance, which reduces the efficacy of predictive models trained using these datasets. Herein we evaluate several pre-processing methods for addressing the class imbalance problem; using several of the most popular machine learning algorithms and a variant of UNSW-NB15 based upon the attributes from the SANTA dataset. |
DOI | 10.1109/IRI.2018.00041 |
Citation Key | wheelus_tackling_2018 |