Anomaly detection for cyber-security defence hasgarnered much attention in recent years providing an orthogonalapproach to traditional signature-based detection systems.Anomaly detection relies on building probability models ofnormal computer network behaviour and detecting deviationsfrom the model. Most data sets used for cyber-security havea mix of user-driven events and automated network events,which most often appears as polling behaviour. Separating theseautomated events from those caused by human activity is essentialto building good statistical models for anomaly detection. This articlepresents a changepoint detection framework for identifyingautomated network events appearing as periodic subsequences ofevent times. The opening event of each subsequence is interpretedas a human action which then generates an automated, periodicprocess. Difficulties arising from the presence of duplicate andmissing data are addressed. The methodology is demonstrated usingauthentication data from Los Alamos National Laboratory'senterprise computer network.
The continuous advance in recent cloud-based computer networks has generated a number of security challenges associated with intrusions in network systems. With the exponential increase in the volume of network traffic data, involvement of humans in such detection systems is time consuming and a non-trivial problem. Secondly, network traffic data tends to be highly dimensional, comprising of numerous features and attributes, making classification challenging and thus susceptible to the curse of dimensionality problem. Given such scenarios, the need arises for dimensional reduction, feature selection, combined with machine-learning techniques in the classification of such data. Therefore, as a contribution, this paper seeks to employ data mining techniques in a cloud-based environment, by selecting appropriate attributes and features with the least importance in terms of weight for the classification. Often the standard is to select features with better weights while ignoring those with least weights. In this study, we seek to find out if we can make prediction using those features with least weights. The motivation is that adversaries use stealth to hide their activities from the obvious. The question then is, can we predict any stealth activity of an adversary using the least observed attributes? In this particular study, we employ information gain to select attributes with the lowest weights and then apply machine learning to classify if a combination, in this case, of both source and destination ports are attacked or not. The motivation of this investigation is if attributes that are of least importance can be used to predict if an attack could occur. Our preliminary results show that even when the source and destination port attributes are used in combination with features with the least weights, it is possible to classify such network traffic data and predict if an attack will occur or not.