Biblio
This paper proposes a highly scalable framework that can be applied to detect network anomaly at per flow level by constructing a meta-model for a family of machine learning algorithms or statistical data models. The approach is scalable and attainable because raw data needs to be accessed only one time and it will be processed, computed and transformed into a meta-model matrix in a much smaller size that can be resident in the system RAM. The calculation of meta-model matrix can be achieved through disposable updating operations at per row level: once a per-flow information is proceeded, it is no longer needed in calculating the meta-model matrix. While the proposed framework covers both Gaussian and non-Gaussian data, the focus of this work is on the linear regression models. Specifically, a new concept called meta-model sufficient statistics is proposed to analyze a group of models, where exact, not the approximate, results are derived. In addition, the proposed framework can quickly discover an optimal statistical or computer model from a family of candidate models without the need of rescanning the raw dataset. This suggest an extremely efficient and effectively theory and method is possible for big data security analysis.