Visible to the public Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs

TitleFast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs
Publication TypeConference Paper
Year of Publication2016
AuthorsManzoor, Emaad, Milajerdi, Sadegh M., Akoglu, Leman
Conference NameProceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Date PublishedAugust 2016
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4232-2
Keywordsadvanced persistent threat, anomaly detection, Attack Graphs, attack vectors, composability, dynamic networks, edge detection, evolving graphs, graph sketches, heterogenous graphs, Human Behavior, Metrics, pubcrawl, Resiliency, Scalability, security, streamspot, temporal networks, threat vectors, Time Frequency Analysis, typed graphs
Abstract

Given a stream of heterogeneous graphs containing different types of nodes and edges, how can we spot anomalous ones in real-time while consuming bounded memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. We propose StreamSpot, a clustering based anomaly detection approach that addresses challenges in two key fronts: (1) heterogeneity, and (2) streaming nature. We introduce a new similarity function for heterogeneous graphs that compares two graphs based on their relative frequency of local substructures, represented as short strings. This function lends itself to a vector representation of a graph, which is (a) fast to compute, and (b) amenable to a sketched version with bounded size that preserves similarity. StreamSpot exhibits desirable properties that a streaming application requires: it is (i) fully-streaming; processing the stream one edge at a time as it arrives, (ii) memory-efficient; requiring constant space for the sketches and the clustering, (iii) fast; taking constant time to update the graph sketches and the cluster summaries that can process over 100,000 edges per second, and (iv) online; scoring and flagging anomalies in real time. Experiments on datasets containing simulated system-call flow graphs from normal browser activity and various attack scenarios (ground truth) show that StreamSpot is high-performance; achieving above 95% detection accuracy with small delay, as well as competitive time and memory usage.

URLhttps://dl.acm.org/doi/10.1145/2939672.2939783
DOI10.1145/2939672.2939783
Citation Keymanzoor_fast_2016