Biblio
Modern industrial control systems (ICS) act as victims of cyber attacks more often in last years. These attacks are hard to detect and their consequences can be catastrophic. Cyber attacks can cause anomalies in the work of the ICS and its technological equipment. The presence of mutual interference and noises in this equipment significantly complicates anomaly detection. Moreover, the traditional means of protection, which used in corporate solutions, require updating with each change in the structure of the industrial process. An approach based on the machine learning for anomaly detection was used to overcome these problems. It complements traditional methods and allows one to detect signal correlations and use them for anomaly detection. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation dataset was analyzed as example of industrial process. In the course of the research, correlations between the signals of the sensors were detected and preliminary data processing was carried out. Algorithms from the most common techniques of machine learning (decision trees, linear algorithms, support vector machines) and deep learning models (neural networks) were investigated for industrial process anomaly detection task. It's shown that linear algorithms are least demanding on computational resources, but they don't achieve an acceptable result and allow a significant number of errors. Decision tree-based algorithms provided an acceptable accuracy, but the amount of RAM, required for their operations, relates polynomially with the training sample volume. The deep neural networks provided the greatest accuracy, but they require considerable computing power for internal calculations.
With the exponential hike in cyber threats, organizations are now striving for better data mining techniques in order to analyze security logs received from their IT infrastructures to ensure effective and automated cyber threat detection. Machine Learning (ML) based analytics for security machine data is the next emerging trend in cyber security, aimed at mining security data to uncover advanced targeted cyber threats actors and minimizing the operational overheads of maintaining static correlation rules. However, selection of optimal machine learning algorithm for security log analytics still remains an impeding factor against the success of data science in cyber security due to the risk of large number of false-positive detections, especially in the case of large-scale or global Security Operations Center (SOC) environments. This fact brings a dire need for an efficient machine learning based cyber threat detection model, capable of minimizing the false detection rates. In this paper, we are proposing optimal machine learning algorithms with their implementation framework based on analytical and empirical evaluations of gathered results, while using various prediction, classification and forecasting algorithms.
The Machine Type Communication Devices (MTCDs) are usually based on Internet Protocol (IP), which can cause billions of connected objects to be part of the Internet. The enormous amount of data coming from these devices are quite heterogeneous in nature, which can lead to security issues, such as injection attacks, ballot stuffing, and bad mouthing. Consequently, this work considers machine learning trust evaluation as an effective and accurate option for solving the issues associate with security threats. In this paper, a comparative analysis is carried out with five different machine learning approaches: Naive Bayes (NB), Decision Tree (DT), Linear and Radial Support Vector Machine (SVM), KNearest Neighbor (KNN), and Random Forest (RF). As a critical element of the research, the recommendations consider different Machine-to-Machine (M2M) communication nodes with regard to their ability to identify malicious and honest information. To validate the performances of these models, two trust computation measures were used: Receiver Operating Characteristics (ROCs), Precision and Recall. The malicious data was formulated in Matlab. A scenario was created where 50% of the information were modified to be malicious. The malicious nodes were varied in the ranges of 10%, 20%, 30%, 40%, and the results were carefully analyzed.
Physical Unclonable Functions (PUFs) have been designed for many security applications such as identification, authentication of devices and key generation, especially for lightweight electronics. Traditional approaches to enhancing security, such as hash functions, may be expensive and resource dependent. However, modelling attacks using machine learning (ML) show the vulnerability of most PUFs. In this paper, a combination of a 32-bit current mirror and 16-bit arbiter PUFs in 65nm CMOS technology is proposed to improve resilience against modelling attacks. Both PUFs are vulnerable to machine learning attacks and we reduce the output prediction rate from 99.2% and 98.8% individually, to 60%.
Nowadays, most vendors apply the same open source code to their products, which is dangerous. In addition, when manufacturers release patches, they generally hide the exact location of the vulnerabilities. So, identifying vulnerabilities in binaries is crucial. However, just searching source program has a lower identifying accuracy of vulnerability, which requires operators further to differentiate searched results. Under this context, we propose VMPBL to enhance identifying the accuracy of vulnerability with the help of patch files. VMPBL, compared with other proposed schemes, uses patched functions according to its vulnerable functions in patch file to further distinguish results. We establish a prototype of VMPBL, which can effectively identify vulnerable function types and get rid of safe functions from results. Firstly, we get the potential vulnerable-patched functions by binary comparison technique based on K-Trace algorithm. Then we combine the functions with vulnerability and patch knowledge database to classify these function pairs and identify the possible vulnerable functions and the vulnerability types. Finally, we test some programs containing real-world CWE vulnerabilities, and one of the experimental results about CWE415 shows that the results returned from only searching source program are about twice as much as the results from VMPBL. We can see that using VMPBL can significantly reduce the false positive rate of discovering vulnerabilities compared with analyzing source files alone.
This computer era leads human to interact with computers and networks but there is no such solution to get rid of security problems. Securities threats misleads internet, we are sometimes losing our hope and reliability with many server based access. Even though many more crypto algorithms are coming for integrity and authentic data in computer access still there is a non reliable threat penetrates inconsistent vulnerabilities in networks. These vulnerable sites are taking control over the user's computer and doing harmful actions without user's privileges. Though Firewalls and protocols may support our browsers via setting certain rules, still our system couldn't support for data reliability and confidentiality. Since these problems are based on network access, lets we consider TCP/IP parameters as a dataset for analysis. By doing preprocess of TCP/IP packets we can build sovereign model on data set and clump cluster. Further the data set gets classified into regular traffic pattern and anonymous pattern using KNN classification algorithm. Based on obtained pattern for normal and threats data sets, security devices and system will set rules and guidelines to learn by it to take needed stroke. This paper analysis the computer to learn security actions from the given data sets which already exist in the previous happens.
With the evolution of network threat, identifying threat from internal is getting more and more difficult. To detect malicious insiders, we move forward a step and propose a novel attribute classification insider threat detection method based on long short term memory recurrent neural networks (LSTM-RNNs). To achieve high detection rate, event aggregator, feature extractor, several attribute classifiers and anomaly calculator are seamlessly integrated into an end-to-end detection framework. Using the CERT insider threat dataset v6.2 and threat detection recall as our performance metric, experimental results validate that the proposed threat detection method greatly outperforms k-Nearest Neighbor, Isolation Forest, Support Vector Machine and Principal Component Analysis based threat detection methods.
Insider threats pose a challenge to all companies and organizations. Identification of culprit after an attack is often too late and result in detrimental consequences for the organization. Majority of past research on insider threat has focused on post-hoc personality analysis of known insider threats to identify personality vulnerabilities. It has been proposed that certain personality vulnerabilities place individuals to be at risk to perpetuating insider threats should the environment and opportunity arise. To that end, this study utilizes a game-based approach to simulate a scenario of intellectual property theft and investigate behavioral and personality differences of individuals who exhibit insider-threat related behavior. Features were extracted from games, text collected through implicit and explicit measures, simultaneous facial expression recordings, and personality variables (HEXACO, Dark Triad and Entitlement Attitudes) calculated from questionnaire. We applied ensemble machine learning algorithms and show that they produce an acceptable balance of precision and recall. Our results showcase the possibility of harnessing personality variables, facial expressions and linguistic features in the modeling and prediction of insider-threat.
In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). Poisoning attack, which is one of the most recognized security threats towards machine learning- based IDSs, injects some adversarial samples into the training phase, inducing data drifting of training data and a significant performance decrease of target IDSs over testing data. In this paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel poisoning method that attack against several machine learning algorithms used in IDSs. Specifically, we propose a boundary pattern detection algorithm to efficiently generate the points that are near to abnormal data but considered to be normal ones by current classifiers. Then, we introduce a Batch-EPD Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the number of edge pattern points generated by EPD and to obtain more useful adversarial samples. Based on BEBP, we further present a moderate but effective poisoning method called chronic poisoning attack. Extensive experiments on synthetic and three real network data sets demonstrate the performance of the proposed poisoning method against several well-known machine learning algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.
An important source of cyber-attacks is malware, which proliferates in different forms such as botnets. The botnet malware typically looks for vulnerable devices across the Internet, rather than targeting specific individuals, companies or industries. It attempts to infect as many connected devices as possible, using their resources for automated tasks that may cause significant economic and social harm while being hidden to the user and device. Thus, it becomes very difficult to detect such activity. A considerable amount of research has been conducted to detect and prevent botnet infestation. In this paper, we attempt to create a foundation for an anomaly-based intrusion detection system using a statistical learning method to improve network security and reduce human involvement in botnet detection. We focus on identifying the best features to detect botnet activity within network traffic using a lightweight logistic regression model. The network traffic is processed by Bro, a popular network monitoring framework which provides aggregate statistics about the packets exchanged between a source and destination over a certain time interval. These statistics serve as features to a logistic regression model responsible for classifying malicious and benign traffic. Our model is easy to implement and simple to interpret. We characterized and modeled 8 different botnet families separately and as a mixed dataset. Finally, we measured the performance of our model on multiple parameters using F1 score, accuracy and Area Under Curve (AUC).
With the rapid development of the information industry, the applications of Internet of things, cloud computing and artificial intelligence have greatly affected people's life, and the network equipment has increased with a blowout type. At the same time, more complex network environment has also led to a more serious network security problem. The traditional security solution becomes inefficient in the new situation. Therefore, it is an important task for the security industry to seek technical progress and improve the protection detection and protection ability of the security industry. Botnets have been one of the most important issues in many network security problems, especially in the last one or two years, and China has become one of the most endangered countries by botnets, thus the huge impact of botnets in the world has caused its detection problems to reset people's attention. This paper, based on the topic of botnet detection, focuses on the latest research achievements of botnet detection based on machine learning technology. Firstly, it expounds the application process of machine learning technology in the research of network space security, introduces the structure characteristics of botnet, and then introduces the machine learning in botnet detection. The security features of these solutions and the commonly used machine learning algorithms are emphatically analyzed and summarized. Finally, it summarizes the existing problems in the existing solutions, and the future development direction and challenges of machine learning technology in the research of network space security.
Botnet is one of the major threats on the Internet for committing cybercrimes, such as DDoS attacks, stealing sensitive information, spreading spams, etc. It is a challenging issue to detect modern botnets that are continuously improving for evading detection. In this paper, we propose a machine learning based botnet detection system that is shown to be effective in identifying P2P botnets. Our approach extracts convolutional version of effective flow-based features, and trains a classification model by using a feed-forward artificial neural network. The experimental results show that the accuracy of detection using the convolutional features is better than the ones using the traditional features. It can achieve 94.7% of detection accuracy and 2.2% of false positive rate on the known P2P botnet datasets. Furthermore, our system provides an additional confidence testing for enhancing performance of botnet detection. It further classifies the network traffic of insufficient confidence in the neural network. The experiment shows that this stage can increase the detection accuracy up to 98.6% and decrease the false positive rate up to 0.5%.
Text-based CAPTCHAs are still commonly used to attempt to prevent automated access to web services. By displaying an image of distorted text, they attempt to create a challenge image that OCR software can not interpret correctly, but a human user can easily determine the correct response to. This work focuses on a CAPTCHA used by a popular Chinese language question-and-answer website and how resilient it is to modern machine learning methods. While the majority of text-based CAPTCHAs focus on transcription tasks, the CAPTCHA solved in this work is based on localization of inverted symbols in a distorted image. A convolutional neural network (CNN) was created to evaluate the likelihood of a region in the image belonging to an inverted character. It is used with a feature map and clustering to identify potential locations of inverted characters. Training of the CNN was performed using curriculum learning and compared to other potential training methods. The proposed method was able to determine the correct response in 95.2% of cases of a simulated CAPTCHA and 67.6% on a set of real CAPTCHAs. Potential methods to increase difficulty of the CAPTCHA and the success rate of the automated solver are considered.
The Internet of Things (IoT) is the network where physical devices, sensors, appliances and other different objects can communicate with each other without the need for human intervention. Wireless Sensor Networks (WSNs) are main building blocks of the IoT. Both the IoT and WSNs have many critical and non-critical applications that touch almost every aspect of our modern life. Unfortunately, these networks are prone to various types of security threats. Therefore, the security of IoT and WSNs became crucial. Furthermore, the resource limitations of the devices used in these networks complicate the problem. One of the most recent and effective approaches to address such challenges is machine learning. Machine learning inspires many solutions to secure the IoT and WSNs. In this paper, we survey the different threats that can attack both IoT and WSNs and the machine learning techniques developed to counter them.
Malicious traffic has garnered more attention in recent years, owing to the rapid growth of information technology in today's world. In 2007 alone, an estimated loss of 13 billion dollars was made from malware attacks. Malware data in today's context is massive. To understand such information using primitive methods would be a tedious task. In this publication we demonstrate some of the most advanced deep learning techniques available, multilayer perceptron (MLP) and J48 (also known as C4.5 or ID3) on our selected dataset, Advanced Security Network Metrics & Non-Payload-Based Obfuscations (ASNM-NPBO) to show that the answer to managing cyber security threats lie in the fore-mentioned methodologies.
To ensure reliable and predictable service in the electrical grid it is important to gauge the level of trust present within critical components and substations. Although trust throughout a smart grid is temporal and dynamically varies according to measured states, it is possible to accurately formulate communications and service level strategies based on such trust measurements. Utilizing an effective set of machine learning and statistical methods, it is shown that establishment of trust levels between substations using behavioral pattern analysis is possible. It is also shown that the establishment of such trust can facilitate simple secure communications routing between substations.
We present an effective machine learning method for malicious activity detection in enterprise security logs. Our method involves feature engineering, or generating new features by applying operators on features of the raw data. We generate DNF formulas from raw features, extract Boolean functions from them, and leverage Fourier analysis to generate new parity features and rank them based on their highest Fourier coefficients. We demonstrate on real enterprise data sets that the engineered features enhance the performance of a wide range of classifiers and clustering algorithms. As compared to classification of raw data features, the engineered features achieve up to 50.6% improvement in malicious recall, while sacrificing no more than 0.47% in accuracy. We also observe better isolation of malicious clusters, when performing clustering on engineered features. In general, a small number of engineered features achieve higher performance than raw data features according to our metrics of interest. Our feature engineering method also retains interpretability, an important consideration in cyber security applications.
Integrated circuits (ICs) are becoming vulnerable to hardware Trojans. Most of existing works require golden chips to provide references for hardware Trojan detection. However, a golden chip is extremely difficult to obtain. In previous work, we have proposed a classification-based golden chips-free hardware Trojan detection technique. However, the algorithm in the previous work are trained by simulated ICs without considering that there may be a shift which occurs between the simulation and the silicon fabrication. It is necessary to learn from actual silicon fabrication in order to obtain an accurate and effective classification model. We propose a co-training based hardware Trojan detection technique exploiting unlabeled fabricated ICs and inaccurate simulation models, to provide reliable detection capability when facing fabricated ICs, while eliminating the need of fabricated golden chips. First, we train two classification algorithms using simulated ICs. During test-time, the two algorithms can identify different patterns in the unlabeled ICs, and thus be able to label some of these ICs for the further training of the another algorithm. Moreover, we use a statistical examination to choose ICs labeling for the another algorithm in order to help prevent a degradation in performance due to the increased noise in the labeled ICs. We also use a statistical technique for combining the hypotheses from the two classification algorithms to obtain the final decision. The theoretical basis of why the co-training method can work is also described. Experiment results on benchmark circuits show that the proposed technique can detect unknown Trojans with high accuracy (92% 97%) and recall (88% 95%).