Biblio
Outlier detection is a fundamental data science task with applications ranging from data cleaning to network security. Recently, a new class of outlier detection algorithms has emerged, called contextual outlier detection, and has shown improved performance when studying anomalous behavior in a specific context. However, as we point out in this article, such approaches have limited applicability in situations where the context is sparse (i.e., lacking a suitable frame of reference). Moreover, approaches developed to date do not scale to large datasets. To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD). We utilize a local and global behavioral model based on the relevant contexts, which is then integrated in a natural and robust fashion. We run ROCOD on both synthetic and real-world datasets and demonstrate that it outperforms other competitive baselines on the axes of efficacy and efficiency. We also drill down and perform a fine-grained analysis to shed light on the rationale for the performance gains of ROCOD and reveal its effectiveness when handling objects with sparse contexts.
We propose a novel, scalable, and principled graph sketching technique based on minwise hashing of local neighborhood. For an n-node graph with e-edges (e textgreatertextgreater n), we incrementally maintain in real-time a minwise neighbor sampled subgraph using k hash functions in O(n x k) memory, limit being user-configurable by the parameter k. Symmetrization and similarity based techniques can recover from these data structures a significant portion of the original graph. We present theoretical analysis of the minwise sampling strategy and also derive unbiased estimators for important graph properties such as triangle count and neighborhood overlap. We perform an extensive empirical evaluation of our graph sketch and it's derivatives on a wide variety of real-world graph data sets drawn from different application domains using important large network analysis algorithms: local and global clustering coefficient, PageRank, and local graph sparsification. With bounded memory, the quality of results using the sketch representation is competitive against baselines which use the full graph, and the computational performance is often better. Our framework is flexible and configurable to be leveraged by numerous other graph analytics algorithms, potentially reducing the information mining time on large streamed graphs for a variety of applications.
Outlier detection is a fundamental data science task with applications ranging from data cleaning to network security. Recently, a new class of outlier detection algorithms has emerged, called contextual outlier detection, and has shown improved performance when studying anomalous behavior in a specific context. However, as we point out in this article, such approaches have limited applicability in situations where the context is sparse (i.e., lacking a suitable frame of reference). Moreover, approaches developed to date do not scale to large datasets. To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD). We utilize a local and global behavioral model based on the relevant contexts, which is then integrated in a natural and robust fashion. We run ROCOD on both synthetic and real-world datasets and demonstrate that it outperforms other competitive baselines on the axes of efficacy and efficiency. We also drill down and perform a fine-grained analysis to shed light on the rationale for the performance gains of ROCOD and reveal its effectiveness when handling objects with sparse contexts.