Visible to the public Biblio

Filters: Keyword is imbalanced data  [Clear All Filters]
2020-12-01
Abdulhammed, R., Faezipour, M., Musafer, H., Abuzneid, A..  2019.  Efficient Network Intrusion Detection Using PCA-Based Dimensionality Reduction of Features. 2019 International Symposium on Networks, Computers and Communications (ISNCC). :1—6.

Designing a machine learning based network intrusion detection system (IDS) with high-dimensional features can lead to prolonged classification processes. This is while low-dimensional features can reduce these processes. Moreover, classification of network traffic with imbalanced class distributions has posed a significant drawback on the performance attainable by most well-known classifiers. With the presence of imbalanced data, the known metrics may fail to provide adequate information about the performance of the classifier. This study first uses Principal Component Analysis (PCA) as a feature dimensionality reduction approach. The resulting low-dimensional features are then used to build various classifiers such as Random Forest (RF), Bayesian Network, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) for designing an IDS. The experimental findings with low-dimensional features in binary and multi-class classification show better performance in terms of Detection Rate (DR), F-Measure, False Alarm Rate (FAR), and Accuracy. Furthermore, in this paper, we apply a Multi-Class Combined performance metric Combi ned Mc with respect to class distribution through incorporating FAR, DR, Accuracy, and class distribution parameters. In addition, we developed a uniform distribution based balancing approach to handle the imbalanced distribution of the minority class instances in the CICIDS2017 network intrusion dataset. We were able to reduce the CICIDS2017 dataset's feature dimensions from 81 to 10 using PCA, while maintaining a high accuracy of 99.6% in multi-class and binary classification.

2020-11-20
Roy, D. D., Shin, D..  2019.  Network Intrusion Detection in Smart Grids for Imbalanced Attack Types Using Machine Learning Models. 2019 International Conference on Information and Communication Technology Convergence (ICTC). :576—581.
Smart grid has evolved as the next generation power grid paradigm which enables the transfer of real time information between the utility company and the consumer via smart meter and advanced metering infrastructure (AMI). These information facilitate many services for both, such as automatic meter reading, demand side management, and time-of-use (TOU) pricing. However, there have been growing security and privacy concerns over smart grid systems, which are built with both smart and legacy information and operational technologies. Intrusion detection is a critical security service for smart grid systems, alerting the system operator for the presence of ongoing attacks. Hence, there has been lots of research conducted on intrusion detection in the past, especially anomaly-based intrusion detection. Problems emerge when common approaches of pattern recognition are used for imbalanced data which represent much more data instances belonging to normal behaviors than to attack ones, and these approaches cause low detection rates for minority classes. In this paper, we study various machine learning models to overcome this drawback by using CIC-IDS2018 dataset [1].
2020-08-28
Hasanin, Tawfiq, Khoshgoftaar, Taghi M., Leevy, Joffrey L..  2019.  A Comparison of Performance Metrics with Severely Imbalanced Network Security Big Data. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). :83—88.

Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.