Visible to the public Biblio

Filters: Author is Hancock, John  [Clear All Filters]
2022-06-14
Zuech, Richard, Hancock, John, Khoshgoftaar, Taghi M..  2021.  Feature Popularity Between Different Web Attacks with Supervised Feature Selection Rankers. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). :30–37.
We introduce the novel concept of feature popularity with three different web attacks and big data from the CSE-CIC-IDS2018 dataset: Brute Force, SQL Injection, and XSS web attacks. Feature popularity is based upon ensemble Feature Selection Techniques (FSTs) and allows us to more easily understand common important features between different cyberattacks, for two main reasons. First, feature popularity lists can be generated to provide an easy comprehension of important features across different attacks. Second, the Jaccard similarity metric can provide a quantitative score for how similar feature subsets are between different attacks. Both of these approaches not only provide more explainable and easier-to-understand models, but they can also reduce the complexity of implementing models in real-world systems. Four supervised learning-based FSTs are used to generate feature subsets for each of our three different web attack datasets, and then our feature popularity frameworks are applied. For these three web attacks, the XSS and SQL Injection feature subsets are the most similar per the Jaccard similarity. The most popular features across all three web attacks are: Flow\_Bytes\_s, FlowİAT\_Max, and Flow\_Packets\_s. While this introductory study is only a simple example using only three web attacks, this feature popularity concept can be easily extended, allowing an automated framework to more easily determine the most popular features across a very large number of attacks and features.
Hancock, John, Khoshgoftaar, Taghi M., Leevy, Joffrey L..  2021.  Detecting SSH and FTP Brute Force Attacks in Big Data. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). :760–765.
We present a simple approach for detecting brute force attacks in the CSE-CIC-IDS2018 Big Data dataset. We show our approach is preferable to more complex approaches since it is simpler, and yields stronger classification performance. Our contribution is to show that it is possible to train and test simple Decision Tree models with two independent variables to classify CSE-CIC-IDS2018 data with better results than reported in previous research, where more complex Deep Learning models are employed. Moreover, we show that Decision Tree models trained on data with two independent variables perform similarly to Decision Tree models trained on a larger number independent variables. Our experiments reveal that simple models, with AUC and AUPRC scores greater than 0.99, are capable of detecting brute force attacks in CSE-CIC-IDS2018. To the best of our knowledge, these are the strongest performance metrics published for the machine learning task of detecting these types of attacks. Furthermore, the simplicity of our approach, combined with its strong performance, makes it an appealing technique.
2022-03-01
Leevy, Joffrey L., Hancock, John, Khoshgoftaar, Taghi M., Seliya, Naeem.  2021.  IoT Reconnaissance Attack Classification with Random Undersampling and Ensemble Feature Selection. 2021 IEEE 7th International Conference on Collaboration and Internet Computing (CIC). :41–49.
The exponential increase in the use of Internet of Things (IoT) devices has been accompanied by a spike in cyberattacks on IoT networks. In this research, we investigate the Bot-IoT dataset with a focus on classifying IoT reconnaissance attacks. Reconnaissance attacks are a foundational step in the cyberattack lifecycle. Our contribution is centered on the building of predictive models with the aid of Random Undersampling (RUS) and ensemble Feature Selection Techniques (FSTs). As far as we are aware, this type of experimentation has never been performed for the Reconnaissance attack category of Bot-IoT. Our work uses the Area Under the Receiver Operating Characteristic Curve (AUC) metric to quantify the performance of a diverse range of classifiers: Light GBM, CatBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Decision Tree (DT), and a Multilayer Perceptron (MLP). For this study, we determined that the best learners are DT and DT-based ensemble classifiers, the best RUS ratio is 1:1 or 1:3, and the best ensemble FST is our ``6 Agree'' technique.