Visible to the public A Machine Learning Approach to Detection of Critical Alerts from Imbalanced Multi-Appliance Threat Alert Logs

TitleA Machine Learning Approach to Detection of Critical Alerts from Imbalanced Multi-Appliance Threat Alert Logs
Publication TypeConference Paper
Year of Publication2021
AuthorsNdichu, Samuel, Ban, Tao, Takahashi, Takeshi, Inoue, Daisuke
Conference Name2021 IEEE International Conference on Big Data (Big Data)
Date Publisheddec
Keywordsalert fatigue, alert screening, Automation, Big Data, class imbalance, cleaning, data cleaning, machine learning, Metrics, Noise measurement, Oversampling, privacy, pubcrawl, security, Support vector machines, threat vectors, Training
AbstractThe extraordinary number of alerts generated by network intrusion detection systems (NIDS) can desensitize security analysts tasked with incident response. Security information and event management systems (SIEMs) perform some rudimentary automation but cannot replicate the decision-making process of a skilled analyst. Machine learning and artificial intelligence (AI) can detect patterns in data with appropriate training. In practice, the majority of the alert data comprises false alerts, and true alerts form only a small proportion. Consequently, a naive engine that classifies all security alerts into the majority class can yield a superficial high accuracy close to 100%. Without any correction for the class imbalance, the false alerts will dominate algorithmic predictions resulting in poor generalization performance. We propose a machine-learning approach to address the class imbalance problem in multi-appliance security alert data and automate the security alert analysis process performed in security operations centers (SOCs). We first used the neighborhood cleaning rule (NCR) to identify and remove ambiguous, noisy, and redundant false alerts. Then, we applied the support vector machine synthetic minority oversampling technique (SVMSMOTE) to generate synthetic training true alerts. Finally, we fit and evaluated the decision tree and random forest classifiers. In the experiments, using alert data from eight security appliances, we demonstrated that the proposed method can significantly reduce the need for manual auditing, decreasing the number of uninspected alerts and achieving a performance of 99.524% in recall.
DOI10.1109/BigData52589.2021.9671956
Citation Keyndichu_machine_2021