Practical and White-Box Anomaly Detection through Unsupervised and Active Learning

Submitted by grigby1 on Thu, 03/04/2021 - 2:35pm

Title	Practical and White-Box Anomaly Detection through Unsupervised and Active Learning
Publication Type	Conference Paper
Year of Publication	2020
Authors	Wang, Y., Wang, Z., Xie, Z., Zhao, N., Chen, J., Zhang, W., Sui, K., Pei, D.
Conference Name	2020 29th International Conference on Computer Communications and Networks (ICCCN)
Keywords	active learning, anomaly detection, composability, Forestry, iRRCF, key performance indicators, KPI anomaly detection framework, Labeling, Metrics, Monitoring, Neural networks, pubcrawl, random forests, resilience, Resiliency, robust random cut forest, RRCF, RRCF algorithm, security, security of data, supervised learning, time series, Time series analysis, Unsupervised Anomaly Detection, unsupervised learning, user experience, white box, White Box Security, white-box anomaly detection
Abstract	To ensure quality of service and user experience, large Internet companies often monitor various Key Performance Indicators (KPIs) of their systems so that they can detect anomalies and identify failure in real time. However, due to a large number of various KPIs and the lack of high-quality labels, existing KPI anomaly detection approaches either perform well only on certain types of KPIs or consume excessive resources. Therefore, to realize generic and practical KPI anomaly detection in the real world, we propose a KPI anomaly detection framework named iRRCF-Active, which contains an unsupervised and white-box anomaly detector based on Robust Random Cut Forest (RRCF), and an active learning component. Specifically, we novelly propose an improved RRCF (iRRCF) algorithm to overcome the drawbacks of applying original RRCF in KPI anomaly detection. Besides, we also incorporate the idea of active learning to make our model benefit from high-quality labels given by experienced operators. We conduct extensive experiments on a large-scale public dataset and a private dataset collected from a large commercial bank. The experimental resulta demonstrate that iRRCF-Active performs better than existing traditional statistical methods, unsupervised learning methods and supervised learning methods. Besides, each component in iRRCF-Active has also been demonstrated to be effective and indispensable.
DOI	10.1109/ICCCN49398.2020.9209704
Citation Key	wang_practical_2020

Groups:

Science of Security VO