Visible to the public A Scalable Meta-Model for Big Data Security Analyses

TitleA Scalable Meta-Model for Big Data Security Analyses
Publication TypeConference Paper
Year of Publication2016
AuthorsYang, B., Zhang, T.
Conference Name2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS)
Date PublishedApril 2016
PublisherIEEE
ISBN Number978-1-5090-2403-2
KeywordsBig Data, Big Data security analysis, Conferences, Data models, learning (artificial intelligence), linear regression, linear regression models, linear regressions, machine learning algorithms, matrix algebra, Meta-Model, meta-model matrix, meta-model sufficient statistics, network anomaly detection, Predictive models, pubcrawl, regression analysis, Scalability, scalable meta-model, Scalable Security, security, security analyses, security of data, statistical data models, sufficient statistics, Training data
Abstract

This paper proposes a highly scalable framework that can be applied to detect network anomaly at per flow level by constructing a meta-model for a family of machine learning algorithms or statistical data models. The approach is scalable and attainable because raw data needs to be accessed only one time and it will be processed, computed and transformed into a meta-model matrix in a much smaller size that can be resident in the system RAM. The calculation of meta-model matrix can be achieved through disposable updating operations at per row level: once a per-flow information is proceeded, it is no longer needed in calculating the meta-model matrix. While the proposed framework covers both Gaussian and non-Gaussian data, the focus of this work is on the linear regression models. Specifically, a new concept called meta-model sufficient statistics is proposed to analyze a group of models, where exact, not the approximate, results are derived. In addition, the proposed framework can quickly discover an optimal statistical or computer model from a family of candidate models without the need of rescanning the raw dataset. This suggest an extremely efficient and effectively theory and method is possible for big data security analysis.

URLhttps://ieeexplore.ieee.org/document/7502264/
DOI10.1109/BigDataSecurity-HPSC-IDS.2016.71
Citation Keyyang_scalable_2016