A Scalable Meta-Model for Big Data Security Analyses

Submitted by grigby1 on Fri, 11/03/2017 - 11:31am

Title	A Scalable Meta-Model for Big Data Security Analyses
Publication Type	Conference Paper
Year of Publication	2016
Authors	Yang, B., Zhang, T.
Conference Name	2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS)
Date Published	April 2016
Publisher	IEEE
ISBN Number	978-1-5090-2403-2
Keywords	Big Data, Big Data security analysis, Conferences, Data models, learning (artificial intelligence), linear regression, linear regression models, linear regressions, machine learning algorithms, matrix algebra, Meta-Model, meta-model matrix, meta-model sufficient statistics, network anomaly detection, Predictive models, pubcrawl, regression analysis, Scalability, scalable meta-model, Scalable Security, security, security analyses, security of data, statistical data models, sufficient statistics, Training data
Abstract	This paper proposes a highly scalable framework that can be applied to detect network anomaly at per flow level by constructing a meta-model for a family of machine learning algorithms or statistical data models. The approach is scalable and attainable because raw data needs to be accessed only one time and it will be processed, computed and transformed into a meta-model matrix in a much smaller size that can be resident in the system RAM. The calculation of meta-model matrix can be achieved through disposable updating operations at per row level: once a per-flow information is proceeded, it is no longer needed in calculating the meta-model matrix. While the proposed framework covers both Gaussian and non-Gaussian data, the focus of this work is on the linear regression models. Specifically, a new concept called meta-model sufficient statistics is proposed to analyze a group of models, where exact, not the approximate, results are derived. In addition, the proposed framework can quickly discover an optimal statistical or computer model from a family of candidate models without the need of rescanning the raw dataset. This suggest an extremely efficient and effectively theory and method is possible for big data security analysis.
URL	https://ieeexplore.ieee.org/document/7502264/
DOI	10.1109/BigDataSecurity-HPSC-IDS.2016.71
Citation Key	yang_scalable_2016

Groups:

Science of Security VO