Biblio
Critical Infrastructures (CIs) use Supervisory Control And Data Acquisition (SCADA) systems for remote control and monitoring. Sophisticated security measures are needed to address malicious intrusions, which are steadily increasing in number and variety due to the massive spread of connectivity and standardisation of open SCADA protocols. Traditional Intrusion Detection Systems (IDSs) cannot detect attacks that are not already present in their databases. Therefore, in this paper, we assess Machine Learning (ML) for intrusion detection in SCADA systems using a real data set collected from a gas pipeline system and provided by the Mississippi State University (MSU). The contribution of this paper is two-fold: 1) The evaluation of four techniques for missing data estimation and two techniques for data normalization, 2) The performances of Support Vector Machine (SVM), and Random Forest (RF) are assessed in terms of accuracy, precision, recall and F1score for intrusion detection. Two cases are differentiated: binary and categorical classifications. Our experiments reveal that RF detect intrusions effectively, with an F1score of respectively \textbackslashtextgreater 99%.
For the security of mobile ad-hoc networks (MANETs), a group of wireless mobile nodes needs to cooperate by forwarding packets, to implement an intrusion detection system (IDS). Some of the current IDS implementations in a clustered MANET have designed mobile nodes to wait until the cluster head is elected before scanning the network and thus nodes may be, unfortunately, exposed to several control packet attacks by which nodes identify falsified routes to reach other nodes. In order to detect control packet attacks such as route falsification, we design a route cache sharing mechanism for a non-clustered network where all one-hop routing data are collected by each node for a cooperative host-based detection. The cooperative host-based detection system uses a Support Vector Machine classifier and achieves a detection rate of around 95%. By successfully detecting the route falsification attacks, nodes are given the capability to avoid other attacks such as black-hole and gray-hole, which are in many cases a result of a successful route falsification attack.
Traditional security controls, such as firewalls, anti-virus and IDS, are ill-equipped to help IT security and response teams keep pace with the rapid evolution of the cyber threat landscape. Cyber Threat Intelligence (CTI) can help remediate this problem by exploiting non-traditional information sources, such as hacker forums and "dark-web" social platforms. Security and response teams can use the collected intelligence to identify emerging threats. Unfortunately, when manual analysis is used to extract CTI from non-traditional sources, it is a time consuming, error-prone and resource intensive process. We address these issues by using a hybrid Machine Learning model that automatically searches through hacker forum posts, identifies the posts that are most relevant to cyber security and then clusters the relevant posts into estimations of the topics that the hackers are discussing. The first (identification) stage uses Support Vector Machines and the second (clustering) stage uses Latent Dirichlet Allocation. We tested our model, using data from an actual hacker forum, to automatically extract information about various threats such as leaked credentials, malicious proxy servers, malware that evades AV detection, etc. The results demonstrate our method is an effective means for quickly extracting relevant and actionable intelligence that can be integrated with traditional security controls to increase their effectiveness.
Cloud nowaday has become the backbone of the IT infrastructure. Whole of the infrastructure is now being shifted to the clouds, and as the cloud involves all of the networking schemes and the OS images, it inherits all of the vulnerabilities too. And hence securing them is one of our very prior concerns. Malwares are one of the many other problems that have ever growing and hence need to be eradicated from the system. The history of mal wares go long back in time since the advent of computers and hence a lot of techniques has also been already devised to tackle with the problem in some or other way. But most of them fall short in some or other way or are just too heavy to execute on a simple user machine. Our approach devises a 3 - phase exhaustive technique which confirms the detection of any kind of malwares from the host. It also works for the zero-day attacks that are really difficult to cover most times and can be of really high-risk at times. We have thought of a solution to keep the things light weight for the user.
Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.
In recent years the use of wireless ad hoc networks has seen an increase of applications. A big part of the research has focused on Mobile Ad Hoc Networks (MAnETs), due to its implementations in vehicular networks, battlefield communications, among others. These peer-to-peer networks usually test novel communications protocols, but leave out the network security part. A wide range of attacks can happen as in wired networks, some of them being more damaging in MANETs. Because of the characteristics of these networks, conventional methods for detection of attack traffic are ineffective. Intrusion Detection Systems (IDSs) are constructed on various detection techniques, but one of the most important is anomaly detection. IDSs based only in past attacks signatures are less effective, even more if these IDSs are centralized. Our work focuses on adding a novel Machine Learning technique to the detection engine, which recognizes attack traffic in an online way (not to store and analyze after), re-writing IDS rules on the fly. Experiments were done using the Dockemu emulation tool with Linux Containers, IPv6 and OLSR as routing protocol, leading to promising results.
Nowadays, an increasing number of IoT vendors have complied and deployed third-party code bases across different architectures. Therefore, to avoid the firmware from being affected by the same known vulnerabilities, searching known vulnerabilities in binary firmware across different architectures is more crucial than ever. However, most of existing vulnerability search methods are limited to the same architecture, there are only a few researches on cross-architecture cases, of which the accuracy is not high. In this paper, to promote the accuracy of existing cross-architecture vulnerability search methods, we propose a new approach based on Support Vector Machine (SVM) and Attributed Control Flow Graph (ACFG) to search known vulnerability in firmware across different architectures at function level. We employ a known vulnerability function to recognize suspicious functions in other binary firmware. First, considering from the internal and external characteristics of the functions, we extract the function level features and basic-block level features of the functions to be inspected. Second, we employ SVM to recognize a little part of suspicious functions based on function level features. After the preliminary screening, we compute the graph similarity between the vulnerability function and suspicious functions based on their ACFGs. We have implemented our approach CVSSA, and employed the training samples to train the model with previous knowledge to improve the accuracy. We also search several vulnerabilities in the real-world firmware images, the experimental results show that CVSSA can be applied to the realistic scenarios.
Interval uncertainty can cause uncontrollable variations in the objective and constraint values, which could seriously deteriorate the performance or even change the feasibility of the optimal solutions. Robust optimization is to obtain solutions that are optimal and minimally sensitive to uncertainty. In this paper, a sequential multi-objective robust optimization (MORO) approach based on support vector machines (SVM) is proposed. Firstly, a sequential optimization structure is adopted to ease the computational burden. Secondly, SVM is used to construct a classification model to classify design alternatives into feasible or infeasible. The proposed approach is tested on a numerical example and an engineering case. Results illustrate that the proposed approach can reasonably approximate solutions obtained from the existing sequential MORO approach (SMORO), while the computational costs are significantly reduced compared with those of SMORO.
In this paper, we propose a new regularization scheme for the well-known Support Vector Machine (SVM) classifier that operates on the training sample level. The proposed approach is motivated by the fact that Maximum Margin-based classification defines decision functions as a linear combination of the selected training data and, thus, the variations on training sample selection directly affect generalization performance. We show that the exploitation of the proposed regularization scheme is well motivated and intuitive. Experimental results show that the proposed regularization scheme outperforms standard SVM in human action recognition tasks as well as classical recognition problems.
Network traffic identification has been a hot topic in network security area. The identification of abnormal traffic can detect attack traffic and helps network manager enforce corresponding security policies to prevent attacks. Support Vector Machines (SVMs) are one of the most promising supervised machine learning (ML) algorithms that can be applied to the identification of traffic in IP networks as well as detection of abnormal traffic. SVM shows better performance because it can avoid local optimization problems existed in many supervised learning algorithms. However, as a binary classification approach, SVM needs more research in multiclass classification. In this paper, we proposed an abnormal traffic identification system(ATIS) that can classify and identify multiple attack traffic applications. Each component of ATIS is introduced in detail and experiments are carried out based on ATIS. Through the test of KDD CUP dataset, SVM shows good performance. Furthermore, the comparison of experiments reveals that scaling and parameters has a vital impact on SVM training results.
Cross-VM attacks have emerged as a major threat on commercial clouds. These attacks commonly exploit hardware level leakages on shared physical servers. A co-located machine can readily feel the presence of a co-located instance with a heavy computational load through performance degradation due to contention on shared resources. Shared cache architectures such as the last level cache (LLC) have become a popular leakage source to mount cross-VM attack. By exploiting LLC leakages, researchers have already shown that it is possible to recover fine grain information such as cryptographic keys from popular software libraries. This makes it essential to verify implementations that handle sensitive data across the many versions and numerous target platforms, a task too complicated, error prone and costly to be handled by human beings. Here we propose a machine learning based technique to classify applications according to their cache access profiles. We show that with minimal and simple manual processing steps feature vectors can be used to train models using support vector machines to classify the applications with a high degree of success. The profiling and training steps are completely automated and do not require any inspection or study of the code to be classified. In native execution, we achieve a successful classification rate as high as 98% (L1 cache) and 78$\backslash$% (LLC) over 40 benchmark applications in the Phoronix suite with mild training. In the cross-VM setting on the noisy Amazon EC2 the success rate drops to 60$\backslash$% for a suite of 25 applications. With this initial study we demonstrate that it is possible to train meaningful models to successfully predict applications running in co-located instances.
The chips in working state have electromagnetic energy leakage problem. We offer a method to analyze the problem of electromagnetic leakage when the chip is running. We execute a sequence of addition and subtraction arithmetic instructions on FPGA chip, then we use the near-field probe to capture the chip leakage of electromagnetic signals. The electromagnetic signal is collected for analysis and processing, the parts of addition and subtraction are classified and identified by SVM. In this paper, for the problem of electromagnetic leakage, six sets of data were collected for analysis and processing. Good results were obtained by using this method.
The smart grid is an electrical grid that has a duplex communication. This communication is between the utility and the consumer. Digital system, automation system, computers and control are the various systems of Smart Grid. It finds applications in a wide variety of systems. Some of its applications have been designed to reduce the risk of power system blackout. Dynamic vulnerability assessment is done to identify, quantify, and prioritize the vulnerabilities in a system. This paper presents a novel approach for classifying the data into one of the two classes called vulnerable or non-vulnerable by carrying out Dynamic Vulnerability Assessment (DVA) based on some data mining techniques such as Multichannel Singular Spectrum Analysis (MSSA), and Principal Component Analysis (PCA), and a machine learning tool such as Support Vector Machine Classifier (SVM-C) with learning algorithms that can analyze data. The developed methodology is tested in the IEEE 57 bus, where the cause of vulnerability is transient instability. The results show that data mining tools can effectively analyze the patterns of the electric signals, and SVM-C can use those patterns for analyzing the system data as vulnerable or non-vulnerable and determines System Vulnerability Status.
The security and typical attack behavior of Modbus/TCP industrial network communication protocol are analyzed. The data feature of traffic flow is extracted through the operation mode of the depth analysis abnormal behavior, and the intrusion detection method based on the support vector machine (SVM) is designed. The method analyzes the data characteristics of abnormal communication behavior, and constructs the feature input structure and detection system based on SVM algorithm by using the direct behavior feature selection and abnormal behavior pattern feature construction. The experimental results show that the method can effectively improve the detection rate of abnormal behavior, and enhance the safety protection function of industrial network.
The aim of this research is to advance the user active authentication using keystroke dynamics. Through this research, we assess the performance and influence of various keystroke features on keystroke dynamics authentication systems. In particular, we investigate the performance of keystroke features on a subset of most frequently used English words. The performance of four features such as i) key duration, ii) flight time latency, iii) diagraph time latency, and iv) word total time duration are analyzed. Two machine learning techniques are employed for assessing keystroke authentications. The selected classification methods are support vector machine (SVM), and k-nearest neighbor classifier (K-NN). The logged experimental data are captured for 28 users. The experimental results show that key duration time offers the best performance result among all four keystroke features, followed by word total time.
The TRECVID report of 2010 [14] evaluated video shot boundary detectors as achieving "excellent performance on [hard] cuts and gradual transitions." Unfortunately, while re-evaluating the state of the art of the shot boundary detection, we found that they need to be improved because the characteristics of consumer-produced videos have changed significantly since the introduction of mobile gadgets, such as smartphones, tablets and outdoor activity purposed cameras, and video editing software has been evolving rapidly. In this paper, we evaluate the best-known approach on a contemporary, publicly accessible corpus, and present a method that achieves better performance, particularly on soft transitions. Our method combines color histograms with key point feature matching to extract comprehensive frame information. Two similarity metrics, one for individual frames and one for sets of frames, are defined based on graph cuts. These metrics are formed into temporal feature vectors on which a SVM is trained to perform the final segmentation. The evaluation on said "modern" corpus of relatively short videos yields a performance of 92% recall (at 89% precision) overall, compared to 69% (91%) of the best-known method.
This paper presents a framework to identify the authors of Thai online messages. The identification is based on 53 writing attributes and the selected algorithms are support vector machine (SVM) and C4.5 decision tree. Experimental results indicate that the overall accuracies achieved by the SVM and the C4.5 were 79% and 75%, respectively. This difference was not statistically significant (at 95% confidence interval). As for the performance of identifying individual authors, in some cases the SVM was clearly better than the C4.5. But there were also other cases where both of them could not distinguish one author from another.