Biblio
Industrial control systems (ICSs) are used in various infrastructures and industrial plants for realizing their control operation and ensuring their safety. Concerns about the cybersecurity of industrial control systems have raised due to the increased number of cyber-attack incidents on critical infrastructures in the light of the advancement in the cyber activity of ICSs. Nevertheless, the operation of the industrial control systems is bind to vital aspects in life, which are safety, economy, and security. This paper presents a semi-supervised, hybrid attack detection approach for industrial control systems by combining Isolation Forest and Convolutional Neural Network (CNN) models. The proposed framework is developed using the normal operational data, and it is composed of a feature extraction model implemented using a One-Dimensional Convolutional Neural Network (1D-CNN) and an isolation forest model for the detection. The two models are trained independently such that the feature extraction model aims to extract useful features from the continuous-time signals that are then used along with the binary actuator signals to train the isolation forest-based detection model. The proposed approach is applied to a down-scaled industrial control system, which is a water treatment plant known as the Secure Water Treatment (SWaT) testbed. The performance of the proposed method is compared with the other works using the same testbed, and it shows an improvement in terms of the detection capability.
Android, being the most widespread mobile operating systems is increasingly becoming a target for malware. Malicious apps designed to turn mobile devices into bots that may form part of a larger botnet have become quite common, thus posing a serious threat. This calls for more effective methods to detect botnets on the Android platform. Hence, in this paper, we present a deep learning approach for Android botnet detection based on Convolutional Neural Networks (CNN). Our proposed botnet detection system is implemented as a CNN-based model that is trained on 342 static app features to distinguish between botnet apps and normal apps. The trained botnet detection model was evaluated on a set of 6,802 real applications containing 1,929 botnets from the publicly available ISCX botnet dataset. The results show that our CNN-based approach had the highest overall prediction accuracy compared to other popular machine learning classifiers. Furthermore, the performance results observed from our model were better than those reported in previous studies on machine learning based Android botnet detection.
To ensure quality of service and user experience, large Internet companies often monitor various Key Performance Indicators (KPIs) of their systems so that they can detect anomalies and identify failure in real time. However, due to a large number of various KPIs and the lack of high-quality labels, existing KPI anomaly detection approaches either perform well only on certain types of KPIs or consume excessive resources. Therefore, to realize generic and practical KPI anomaly detection in the real world, we propose a KPI anomaly detection framework named iRRCF-Active, which contains an unsupervised and white-box anomaly detector based on Robust Random Cut Forest (RRCF), and an active learning component. Specifically, we novelly propose an improved RRCF (iRRCF) algorithm to overcome the drawbacks of applying original RRCF in KPI anomaly detection. Besides, we also incorporate the idea of active learning to make our model benefit from high-quality labels given by experienced operators. We conduct extensive experiments on a large-scale public dataset and a private dataset collected from a large commercial bank. The experimental resulta demonstrate that iRRCF-Active performs better than existing traditional statistical methods, unsupervised learning methods and supervised learning methods. Besides, each component in iRRCF-Active has also been demonstrated to be effective and indispensable.
The decisions made by machines are increasingly comparable in predictive performance to those made by humans, but these decision making processes are often concealed as black boxes. Additional techniques are required to extract understanding, and one such category are explanation methods. This research compares the explanations of two popular forms of artificial intelligence; neural networks and random forests. Researchers in either field often have divided opinions on transparency, and comparing explanations may discover similar ground truths between models. Similarity can help to encourage trust in predictive accuracy alongside transparent structure and unite the respective research fields. This research explores a variety of simulated and real-world datasets that ensure fair applicability to both learning algorithms. A new heuristic explanation method that extends an existing technique is introduced, and our results show that this is somewhat similar to the other methods examined whilst also offering an alternative perspective towards least-important features.
The exponential growth rate of malware causes significant security concern in this digital era to computer users, private and government organizations. Traditional malware detection methods employ static and dynamic analysis, which are ineffective in identifying unknown malware. Malware authors develop new malware by using polymorphic and evasion techniques on existing malware and escape detection. Newly arriving malware are variants of existing malware and their patterns can be analyzed using the vision-based method. Malware patterns are visualized as images and their features are characterized. The alternative generation of class vectors and feature vectors using ensemble forests in multiple sequential layers is performed for classifying malware. This paper proposes a hybrid stacked multilayered ensembling approach which is robust and efficient than deep learning models. The proposed model outperforms the machine learning and deep learning models with an accuracy of 98.91%. The proposed system works well for small-scale and large-scale data since its adaptive nature of setting parameters (number of sequential levels) automatically. It is computationally efficient in terms of resources and time. The method uses very fewer hyper-parameters compared to deep neural networks.
In the field of network traffic analysis, Deep Packet Inspection (DPI) technology is widely used at present. However, the increase in network traffic has brought tremendous processing pressure on the DPI. Consequently, detection speed has become the bottleneck of the entire application. In order to speed up the traffic detection of DPI, a lot of research works have been applied to improve signature matching algorithms, which is the most influential factor in DPI performance. In this paper, we present a novel method from a different angle called Precisely Guided Signature Matching (PGSM). Instead of matching packets with signature directly, we use supervised learning to automate the rules of specific protocol in PGSM. By testing the performance of a packet in the rules, the target packet could be decided when and which signatures should be matched with. Thus, the PGSM method reduces the number of aimless matches which are useless and numerous. After proposing PGSM, we build a framework called PGSM-DPI to verify the effectiveness of guidance rules. The PGSM-DPI framework consists of PGSM method and open source DPI library. The framework is running on a distributed platform with better throughput and computational performance. Finally, the experimental results demonstrate that our PGSM-DPI can reduce 59.23% original DPI time and increase 21.31% throughput. Besides, all source codes and experimental results can be accessed on our GitHub.
With the rapidly increasing connectivity in cyberspace, Insider Threat is becoming a huge concern. Insider threat detection from system logs poses a tremendous challenge for human analysts. Analyzing log files of an organization is a key component of an insider threat detection and mitigation program. Emerging machine learning approaches show tremendous potential for performing complex and challenging data analysis tasks that would benefit the next generation of insider threat detection systems. However, with huge sets of heterogeneous data to analyze, applying machine learning techniques effectively and efficiently to such a complex problem is not straightforward. In this paper, we extract a concise set of features from the system logs while trying to prevent loss of meaningful information and providing accurate and actionable intelligence. We investigate two unsupervised anomaly detection algorithms for insider threat detection and draw a comparison between different structures of the system logs including daily dataset and periodically aggregated one. We use the generated anomaly score from the previous cycle as the trust score of each user fed to the next period's model and show its importance and impact in detecting insiders. Furthermore, we consider the psychometric score of users in our model and check its effectiveness in predicting insiders. As far as we know, our model is the first one to take the psychometric score of users into consideration for insider threat detection. Finally, we evaluate our proposed approach on CERT insider threat dataset (v4.2) and show how it outperforms previous approaches.
When vertically aligned carbon nanotube arrays (CNT forests) are heated by optical, electrical, or any other means, heat confinement in the lateral directions (i.e. perpendicular to the CNTs' axes), which stems from the anisotropic structure of the forest, is expected to play an important role. It has been found that, in spite of being primarily conductive along the CNTs' axes, focusing a laser beam on the sidewall of a CNT forest can lead to a highly localized hot region-an effect known as ``Heat Trap''-and efficient thermionic emission. This unusual heat confinement phenomenon has applications where the spread of heat has to be minimized, but electrical conduction is required, notably in energy conversion (e.g. vacuum thermionics and thermoelectrics). However, despite its strong scientific and practical importance, the existence and role of the lateral heat confinement in the Heat Trap effect have so far been elusive. In this work, for the first time, by using a rotating elliptical laser beam, we directly observe the existence of this lateral heat confinement and its corresponding effects on the unusual temperature rise during the Heat Trap effect.
Anomaly detection on security logs is receiving more and more attention. Authentication events are an important component of security logs, and being able to produce trustful and accurate predictions minimizes the effort of cyber-experts to stop false attacks. Observed events are classified into Normal, for legitimate user behavior, and Malicious, for malevolent actions. These classes are consistently excessively imbalanced which makes the classification problem harder; in the commonly used Los Alamos dataset, the malicious class comprises only 0.00033% of the total. This work proposes a novel method to extract advanced composite features, and a supervised learning technique for classifying authentication logs trustfully; the models are Random Forest, LogitBoost, Logistic Regression, and ultimately Majority Voting which leverages the predictions of the previous models and gives the final prediction for each authentication event. We measure the performance of our experiments by using the False Negative Rate and False Positive Rate. In overall we achieve 0 False Negative Rate (i.e. no attack was missed), and on average a False Positive Rate of 0.0019.
With the increase in the popularity of computerized online applications, the analysis, and detection of a growing number of newly discovered stealthy malware poses a significant challenge to the security community. Signature-based and behavior-based detection techniques are becoming inefficient in detecting new unknown malware. Machine learning solutions are employed to counter such intelligent malware and allow performing more comprehensive malware detection. This capability leads to an automatic analysis of malware behavior. The proposed oblique random forest ensemble learning technique is efficient for malware classification. The effectiveness of the proposed method is demonstrated with three malware classification datasets from various sources. The results are compared with other variants of decision tree learning models. The proposed system performs better than the existing system in terms of classification accuracy and false positive rate.
This paper presents an assessment of continuous verification using linguistic style as a cognitive biometric. In stylometry, it is widely known that linguistic style is highly characteristic of authorship using representations that capture authorial style at character, lexical, syntactic, and semantic levels. In this work, we provide a contrast to previous efforts by implementing a one-class classification problem using Isolation Forests. Our approach demonstrates the usefulness of this classifier for accurately verifying the genuine user, and yields recognition accuracy exceeding 98% using very small training samples of 50 and 100-character blocks.
Trying to solve the risk of data privacy disclosure in classification process, a Random Forest algorithm under differential privacy named DPRF-gini is proposed in the paper. In the process of building decision tree, the algorithm first disturbed the process of feature selection and attribute partition by using exponential mechanism, and then meet the requirement of differential privacy by adding Laplace noise to the leaf node. Compared with the original algorithm, Empirical results show that protection of data privacy is further enhanced while the accuracy of the algorithm is slightly reduced.