Biblio
In the modern day and age, credential based authentication systems no longer provide the level of security that many organisations and their services require. The level of trust in passwords has plummeted in recent years, with waves of cyber attacks predicated on compromised and stolen credentials. This method of authentication is also heavily reliant on the individual user's choice of password. There is the potential to build levels of security on top of credential based authentication systems, using a risk based approach, which preserves the seamless authentication experience for the end user. One method of adding this security to a risk based authentication framework, is keystroke dynamics. Monitoring the behaviour of the users and how they type, produces a type of digital signature which is unique to that individual. Learning this behaviour allows dynamic flags to be applied to anomalous typing patterns that are produced by attackers using stolen credentials, as a potential risk of fraud. Methods from statistics and machine learning have been explored to try and implement such solutions. This paper will look at an Autoencoder model for learning the keystroke dynamics of specific users. The results from this paper show an improvement over the traditional tried and tested statistical approaches with an Equal Error Rate of 6.51%, with the additional benefits of relatively low training times and less reliance on feature engineering.
The aim of this paper is to show the importance of Computational Stylometry (CS) and Machine Learning (ML) support in author's gender and age detection in cyberbullying texts. We developed a cyberbullying detection platform and we show the results of performances in terms of Precision, Recall and F -Measure for gender and age detection in cyberbullying texts we collected.
The greatest threat towards securing the organization and its assets are no longer the attackers attacking beyond the network walls of the organization but the insiders present within the organization with malicious intent. Existing approaches helps to monitor, detect and prevent any malicious activities within an organization's network while ignoring the human behavior impact on security. In this paper we have focused on user behavior profiling approach to monitor and analyze user behavior action sequence to detect insider threats. We present an ensemble hybrid machine learning approach using Multi State Long Short Term Memory (MSLSTM) and Convolution Neural Networks (CNN) based time series anomaly detection to detect the additive outliers in the behavior patterns based on their spatial-temporal behavior features. We find that using Multistate LSTM is better than basic single state LSTM. The proposed method with Multistate LSTM can successfully detect the insider threats providing the AUC of 0.9042 on train data and AUC of 0.9047 on test data when trained with publically available dataset for insider threats.
Recently, malicious insider attacks represent one of the most damaging threats to companies and government agencies. This paper proposes a new framework in constructing a user-centered machine learning based insider threat detection system on multiple data granularity levels. System evaluations and analysis are performed not only on individual data instances but also on normal and malicious insiders, where insider scenario specific results and delay in detection are reported and discussed. Our results show that the machine learning based detection system can learn from limited ground truth and detect new malicious insiders with a high accuracy.
With the rapidly increasing connectivity in cyberspace, Insider Threat is becoming a huge concern. Insider threat detection from system logs poses a tremendous challenge for human analysts. Analyzing log files of an organization is a key component of an insider threat detection and mitigation program. Emerging machine learning approaches show tremendous potential for performing complex and challenging data analysis tasks that would benefit the next generation of insider threat detection systems. However, with huge sets of heterogeneous data to analyze, applying machine learning techniques effectively and efficiently to such a complex problem is not straightforward. In this paper, we extract a concise set of features from the system logs while trying to prevent loss of meaningful information and providing accurate and actionable intelligence. We investigate two unsupervised anomaly detection algorithms for insider threat detection and draw a comparison between different structures of the system logs including daily dataset and periodically aggregated one. We use the generated anomaly score from the previous cycle as the trust score of each user fed to the next period's model and show its importance and impact in detecting insiders. Furthermore, we consider the psychometric score of users in our model and check its effectiveness in predicting insiders. As far as we know, our model is the first one to take the psychometric score of users into consideration for insider threat detection. Finally, we evaluate our proposed approach on CERT insider threat dataset (v4.2) and show how it outperforms previous approaches.
Intrusion detection is one essential tool towards building secure and trustworthy Cloud computing environment, given the ubiquitous presence of cyber attacks that proliferate rapidly and morph dynamically. In our current working paradigm of resource, platform and service consolidations, Cloud Computing provides a significant improvement in the cost metrics via dynamic provisioning of IT services. Since almost all cloud computing networks lean on providing their services through Internet, they are prone to experience variety of security issues. Therefore, in cloud environments, it is necessary to deploy an Intrusion Detection System (IDS) to detect new and unknown attacks in addition to signature based known attacks, with high accuracy. In our deliberation we assume that a system or a network ``anomalous'' event is synonymous to an ``intrusion'' event when there is a significant departure in one or more underlying system or network activities. There are couple of recently proposed ideas that aim to develop a hybrid detection mechanism, combining advantages of signature-based detection schemes with the ability to detect unknown attacks based on anomalies. In this work, we propose a network based anomaly detection system at the Cloud Hypervisor level that utilizes a hybrid algorithm: a combination of K-means clustering algorithm and SVM classification algorithm, to improve the accuracy of the anomaly detection system. Dataset from UNSW-NB15 study is used to evaluate the proposed approach and results are compared with previous studies. The accuracy for our proposed K-means clustering model is slightly higher than others. However, the accuracy we obtained from the SVM model is still low for supervised techniques.
Deep Learning is an area of Machine Learning research, which can be used to manipulate large amount of information in an intelligent way by using the functionality of computational intelligence. A deep learning system is a fully trainable system beginning from raw input to the final output of recognized objects. Feature selection is an important aspect of deep learning which can be applied for dimensionality reduction or attribute reduction and making the information more explicit and usable. Deep learning can build various learning models which can abstract unknown information by selecting a subset of relevant features. This property of deep learning makes it useful in analysis of highly complex information one which is present in intrusive data or information flowing with in a web system or a network which needs to be analyzed to detect anomalies. Our approach combines the intelligent ability of Deep Learning to build a smart Intrusion detection system.
Dendritic cell algorithm (DCA) is an immune-inspired classification algorithm which is developed for the purpose of anomaly detection in computer networks. The DCA uses a weighted function in its context detection phase to process three categories of input signals including safe, danger and pathogenic associated molecular pattern to three output context values termed as co-stimulatory, mature and semi-mature, which are then used to perform classification. The weighted function used by the DCA requires either manually pre-defined weights usually provided by the immunologists, or empirically derived weights from the training dataset. Neither of these is sufficiently flexible to work with different datasets to produce optimum classification result. To address such limitation, this work proposes an approach for computing the three output context values of the DCA by employing the recently proposed TSK+ fuzzy inference system, such that the weights are always optimal for the provided data set regarding a specific application. The proposed approach was validated and evaluated by applying it to the two popular datasets KDD99 and UNSW NB15. The results from the experiments demonstrate that, the proposed approach outperforms the conventional DCA in terms of classification accuracy.
With the ever-growing occurrence of networking attacks, robust network security systems are essential to prevent and mitigate their harming effects. In recent years, machine learning-based systems have gain popularity for network security applications, usually considering the application of shallow models, where a set of expert handcrafted features are needed to pre-process the data before training. The main problem with this approach is that handcrafted features can fail to perform well given different kinds of scenarios and problems. Deep Learning models can solve this kind of issues using their ability to learn feature representations from input raw or basic, non-processed data. In this paper we explore the power of deep learning models on the specific problem of detection and classification of malware network traffic, using different representations for the input data. As a major advantage as compared to the state of the art, we consider raw measurements coming directly from the stream of monitored bytes as the input to the proposed models, and evaluate different raw-traffic feature representations, including packet and flow-level ones. Our results suggest that deep learning models can better capture the underlying statistics of malicious traffic as compared to classical, shallow-like models, even while operating in the dark, i.e., without any sort of expert handcrafted inputs.
False data injection is an on-going concern facing power system state estimation. In this work, a neural network is trained to detect the existence of false data in measurements. The proposed approach can make use of historical data, if available, by using them in the training sets of the proposed neural network model. However, the inputs of perceptron model in this work are the residual elements from the state estimation, which are highly correlated. Therefore, their dimension could be reduced by preserving the most informative features from the inputs. To this end, principal component analysis is used (i.e., a data preprocessing technique). This technique is especially efficient for highly correlated data sets, which is the case in power system measurements. The results of different perceptron models that are proposed for detection, are compared to a simple perceptron that produces identical result to the outlier detection scheme. For generating the training sets, state estimation was run for different false data on different measurements in 13-bus IEEE test system, and the residuals are saved as inputs of training sets. The testing results of the trained network show its good performance in detection of false data in measurements.
Everyday., the DoS/DDoS attacks are increasing all over the world and the ways attackers are using changing continuously. This increase and variety on the attacks are affecting the governments, institutions, organizations and corporations in a bad way. Every successful attack is causing them to lose money and lose reputation in return. This paper presents an introduction to a method which can show what the attack and where the attack based on. This is tried to be achieved with using clustering algorithm DBSCAN on network traffic because of the change and variety in attack vectors.
An important ingredient for a successful recipe for solving machine learning problems is the availability of a suitable dataset. However, such a dataset may have to be extracted from a large unstructured and semi-structured data like programming code, scripts, and text. In this work, we propose a plug-in based, extensible feature extraction framework for which we have prototyped as a tool. The proposed framework is demonstrated by extracting features from two different sources of semi-structured and unstructured data. The semi-structured data comprised of web page and script based data whereas the other data was taken from email data for spam filtering. The usefulness of the tool was also assessed on the aspect of ease of programming.
This paper studies the principle of vulnerability generation and mechanism of cross-site scripting attack, designs a dynamic cross-site scripting vulnerabilities detection technique based on existing theories of black box vulnerabilities detection. The dynamic detection process contains five steps: crawler, feature construct, attacks simulation, results detection and report generation. Crawling strategy in crawler module and constructing algorithm in feature construct module are key points of this detection process. Finally, according to the detection technique proposed in this paper, a detection tool is accomplished in Linux using python language to detect web applications. Experiments were launched to verify the results and compare with the test results of other existing tools, analyze the usability, advantages and disadvantages of the detection method above, confirm the feasibility of applying dynamic detection technique to cross-site scripting vulnerabilities detection.