Biblio
To ensure quality of service and user experience, large Internet companies often monitor various Key Performance Indicators (KPIs) of their systems so that they can detect anomalies and identify failure in real time. However, due to a large number of various KPIs and the lack of high-quality labels, existing KPI anomaly detection approaches either perform well only on certain types of KPIs or consume excessive resources. Therefore, to realize generic and practical KPI anomaly detection in the real world, we propose a KPI anomaly detection framework named iRRCF-Active, which contains an unsupervised and white-box anomaly detector based on Robust Random Cut Forest (RRCF), and an active learning component. Specifically, we novelly propose an improved RRCF (iRRCF) algorithm to overcome the drawbacks of applying original RRCF in KPI anomaly detection. Besides, we also incorporate the idea of active learning to make our model benefit from high-quality labels given by experienced operators. We conduct extensive experiments on a large-scale public dataset and a private dataset collected from a large commercial bank. The experimental resulta demonstrate that iRRCF-Active performs better than existing traditional statistical methods, unsupervised learning methods and supervised learning methods. Besides, each component in iRRCF-Active has also been demonstrated to be effective and indispensable.
In the process of informationization and networking of smart grids, the original physical isolation was broken, potential risks increased, and the increasingly serious cyber security situation was faced. Therefore, it is critical to develop accuracy and efficient anomaly detection methods to disclose various threats. However, in the industry, mainstream security devices such as firewalls are not able to detect and resist some advanced behavior attacks. In this paper, we propose a time series anomaly detection model, which is based on the periodic extraction method of discrete Fourier transform, and determines the sequence position of each element in the period by periodic overlapping mapping, thereby accurately describe the timing relationship between each network message. The experiments demonstrate that our model can detect cyber attacks such as man-in-the-middle, malicious injection, and Dos in a highly periodic network.
This paper describes the technology of neural network application to solve the problem of information security incidents forecasting. We describe the general problem of analyzing and predicting time series in a graphical and mathematical setting. To solve this problem, it is proposed to use a neural network model. To solve the task of forecasting a time series of information security incidents, data are generated and described on the basis of which the neural network is trained. We offer a neural network structure, train the neural network, estimate it's adequacy and forecasting ability. We show the possibility of effective use of a neural network model as a part of an intelligent forecasting system.
Cyber defense can no longer be limited to intrusion detection methods. These systems require malicious activity to enter an internal network before an attack can be detected. Having advanced, predictive knowledge of future attacks allow a potential victim to heighten security and possibly prevent any malicious traffic from breaching the network. This paper investigates the use of Auto-Regressive Integrated Moving Average (ARIMA) models and Bayesian Networks (BN) to predict future cyber attack occurrences and intensities against two target entities. In addition to incident count forecasting, categorical and binary occurrence metrics are proposed to better represent volume forecasts to a victim. Different measurement periods are used in time series construction to better model the temporal patterns unique to each attack type and target configuration, seeing over 86% improvement over baseline forecasts. Using ground truth aggregated over different measurement periods as signals, a BN is trained and tested for each attack type and the obtained results provided further evidence to support the findings from ARIMA. This work highlights the complexity of cyber attack occurrences; each subset has unique characteristics and is influenced by a number of potential external factors.
The IRC botnet is the earliest and most significant botnet group that has a significant impact. Its characteristic is to control multiple zombies hosts through the IRC protocol and constructing command control channels. Relevant research analyzes the large amount of network traffic generated by command interaction between the botnet client and the C&C server. Packet capture traffic monitoring on the network is currently a more effective detection method, but this information does not reflect the essential characteristics of the IRC botnet. The increase in the amount of erroneous judgments has often occurred. To identify whether the botnet control server is a homogenous botnet, dynamic network communication characteristic curves are extracted. For unequal time series, dynamic time warping distance clustering is used to identify the homologous botnets by category, and in order to improve detection. Speed, experiments will use SAX to reduce the dimension of the extracted curve, reducing the time cost without reducing the accuracy.
To identify potential risks to the system security presented by time series it is offered to use wavelet analysis, the indicator of time-and-frequency distribution, the correlation analysis of wavelet-spectra for receiving rather complete range of data about the process studied. The indicator of time-and-frequency localization of time series was proposed allowing to estimate the speed of non-stationary changing. The complex approach is proposed to use the wavelet analysis, the time-and-frequency distribution of time series and the wavelet spectra correlation analysis; this approach contributes to obtaining complete information on the studied phenomenon both in numerical terms, and in the form of visualization for identifying and predicting potential system security threats.
The recently developed deep belief network (DBN) has been shown to be an effective methodology for solving time series forecasting problems. However, the performance of DBN is seriously depended on the reasonable setting of hyperparameters. At present, random search, grid search and Bayesian optimization are the most common methods of hyperparameters optimization. As an alternative, a state-of-the-art derivative-free optimizer-negative correlation search (NCS) is adopted in this paper to decide the sizes of DBN and learning rates during the training processes. A comparative analysis is performed between the proposed method and other popular techniques in the time series forecasting experiment based on two types of time series datasets. Experiment results statistically affirm the efficiency of the proposed model to obtain better prediction results compared with conventional neural network models.
The prevalence of mobile devices and location-based services (LBS) has generated great concerns regarding the LBS users' privacy, which can be compromised by statistical analysis of their movement patterns. A number of algorithms have been proposed to protect the privacy of users in such systems, but the fundamental underpinnings of such remain unexplored. Recently, the concept of perfect location privacy was introduced and its achievability was studied for anonymization-based LBS systems, where user identifiers are permuted at regular intervals to prevent identification based on statistical analysis of long time sequences. In this paper, we significantly extend that investigation by incorporating the other major tool commonly employed to obtain location privacy: obfuscation, where user locations are purposely obscured to protect their privacy. Since anonymization and obfuscation reduce user utility in LBS systems, we investigate how location privacy varies with the degree to which each of these two methods is employed. We provide: (1) achievability results for the case where the location of each user is governed by an i.i.d. process; (2) converse results for the i.i.d. case as well as the more general Markov Chain model. We show that, as the number of users in the network grows, the obfuscation-anonymization plane can be divided into two regions: in the first region, all users have perfect location privacy; and, in the second region, no user has location privacy.
Vehicular Ad Hoc Networks (VANETs) enable vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications that bring many benefits and conveniences to improve the road safety and drive comfort in future transportation systems. Sybil attack is considered one of the most risky threats in VANETs since a Sybil attacker can generate multiple fake identities with false messages to severely impair the normal functions of safety-related applications. In this paper, we propose a novel Sybil attack detection method based on Received Signal Strength Indicator (RSSI), Voiceprint, to conduct a widely applicable, lightweight and full-distributed detection for VANETs. To avoid the inaccurate position estimation according to predefined radio propagation models in previous RSSI-based detection methods, Voiceprint adopts the RSSI time series as the vehicular speech and compares the similarity among all received time series. Voiceprint does not rely on any predefined radio propagation model, and conducts independent detection without the support of the centralized infrastructure. It has more accurate detection rate in different dynamic environments. Extensive simulations and real-world experiments demonstrate that the proposed Voiceprint is an effective method considering the cost, complexity and performance.