Biblio
The enormous growth of Internet-based traffic exposes corporate networks with a wide variety of vulnerabilities. Intrusive traffics are affecting the normal functionality of network's operation by consuming corporate resources and time. Efficient ways of identifying, protecting, and mitigating from intrusive incidents enhance productivity. As Intrusion Detection System (IDS) is hosted in the network and at the user machine level to oversee the malicious traffic in the network and at the individual computer, it is one of the critical components of a network and host security. Unsupervised anomaly traffic detection techniques are improving over time. This research aims to find an efficient classifier that detects anomaly traffic from NSL-KDD dataset with high accuracy level and minimal error rate by experimenting with five machine learning techniques. Five binary classifiers: Stochastic Gradient Decent, Random Forests, Logistic Regression, Support Vector Machine, and Sequential Model are tested and validated to produce the result. The outcome demonstrates that Random Forest Classifier outperforms the other four classifiers with and without applying the normalization process to the dataset.
In order to examine malicious activity that occurs in a network or a system, intrusion detection system is used. Intrusion Detection is software or a device that scans a system or a network for a distrustful activity. Due to the growing connectivity between computers, intrusion detection becomes vital to perform network security. Various machine learning techniques and statistical methodologies have been used to build different types of Intrusion Detection Systems to protect the networks. Performance of an Intrusion Detection is mainly depends on accuracy. Accuracy for Intrusion detection must be enhanced to reduce false alarms and to increase the detection rate. In order to improve the performance, different techniques have been used in recent works. Analyzing huge network traffic data is the main work of intrusion detection system. A well-organized classification methodology is required to overcome this issue. This issue is taken in proposed approach. Machine learning techniques like Support Vector Machine (SVM) and Naïve Bayes are applied. These techniques are well-known to solve the classification problems. For evaluation of intrusion detection system, NSL- KDD knowledge discovery Dataset is taken. The outcomes show that SVM works better than Naïve Bayes. To perform comparative analysis, effective classification methods like Support Vector Machine and Naive Bayes are taken, their accuracy and misclassification rate get calculated.
Since the term “Fog Computing” has been coined by Cisco Systems in 2012, security and privacy issues of this promising paradigm are still open challenges. Among various security challenges, Access Control is a crucial concern for all cloud computing-like systems (e.g. Fog computing, Mobile edge computing) in the IoT era. Therefore, assigning the precise level of access in such an inherently scalable, heterogeneous and dynamic environment is not easy to perform. This work defines the uncertainty challenge for authentication phase of the access control in fog computing because on one hand fog has a number of characteristics that amplify uncertainty in authentication and on the other hand applying traditional access control models does not result in a flexible and resilient solution. Therefore, we have proposed a novel prediction model based on the extension of Attribute Based Access Control (ABAC) model. Our data-driven model is able to handle uncertainty in authentication. It is also able to consider the mobility of mobile edge devices in order to handle authentication. In doing so, we have built our model using and comparing four supervised classification algorithms namely as Decision Tree, Naïve Bayes, Logistic Regression and Support Vector Machine. Our model can achieve authentication performance with 88.14% accuracy using Logistic Regression.
Critical Infrastructures (CIs) use Supervisory Control And Data Acquisition (SCADA) systems for remote control and monitoring. Sophisticated security measures are needed to address malicious intrusions, which are steadily increasing in number and variety due to the massive spread of connectivity and standardisation of open SCADA protocols. Traditional Intrusion Detection Systems (IDSs) cannot detect attacks that are not already present in their databases. Therefore, in this paper, we assess Machine Learning (ML) for intrusion detection in SCADA systems using a real data set collected from a gas pipeline system and provided by the Mississippi State University (MSU). The contribution of this paper is two-fold: 1) The evaluation of four techniques for missing data estimation and two techniques for data normalization, 2) The performances of Support Vector Machine (SVM), and Random Forest (RF) are assessed in terms of accuracy, precision, recall and F1score for intrusion detection. Two cases are differentiated: binary and categorical classifications. Our experiments reveal that RF detect intrusions effectively, with an F1score of respectively \textbackslashtextgreater 99%.
In recent years, deep convolution neural networks (DCNNs) have won many contests in machine learning, object detection, and pattern recognition. Furthermore, deep learning techniques achieved exceptional performance in image classification, reaching accuracy levels beyond human capability. Malware variants from similar categories often contain similarities due to code reuse. Converting malware samples into images can cause these patterns to manifest as image features, which can be exploited for DCNN classification. Techniques for converting malware binaries into images for visualization and classification have been reported in the literature, and while these methods do reach a high level of classification accuracy on training datasets, they tend to be vulnerable to overfitting and perform poorly on previously unseen samples. In this paper, we explore and document a variety of techniques for representing malware binaries as images with the goal of discovering a format best suited for deep learning. We implement a database for malware binaries from several families, stored in hexadecimal format. These malware samples are converted into images using various approaches and are used to train a neural network to recognize visual patterns in the input and classify malware based on the feature vectors. Each image type is assessed using a variety of learning models, such as transfer learning with existing DCNN architectures and feature extraction for support vector machine classifier training. Each technique is evaluated in terms of classification accuracy, result consistency, and time per trial. Our preliminary results indicate that improved image representation has the potential to enable more effective classification of new malware.
Iris recognition is one of the most reliable biometrics for identification purpose in terms of reliability and accuracy. Hence, in this research the integration of cancelable biometrics features for iris recognition using encryption and decryption non-invertible transformation is proposed. Here, the biometric data is protected via the proposed cancelable biometrics method. The experimental results showed that the recognition rate achieved is 99.9% using Bath-A dataset with a maximum decision criterion of 0.97 along with acceptable processing time.
With the frequent use of Wi-Fi and hotspots that provide a wireless Internet environment, awareness and threats to wireless AP (Access Point) security are steadily increasing. Especially when using unauthorized APs in company, government and military facilities, there is a high possibility of being subjected to various viruses and hacking attacks. It is necessary to detect unauthorized Aps for protection of information. In this paper, we use RTT (Round Trip Time) value data set to detect authorized and unauthorized APs in wired / wireless integrated environment, analyze them using machine learning algorithms including SVM (Support Vector Machine), C4.5, KNN (K Nearest Neighbors) and MLP (Multilayer Perceptron). Overall, KNN shows the highest accuracy.
Affective1 engineering is a methodology of designing products by collecting customer affective needs and translating them into product designs. It usually begins with questionnaire surveys to collect customer affective demands and responses. However, this process is expensive, which can only be conducted periodically in a small scale. With the rapid development of e-commerce, a larger number of customer product reviews are available on the Internet. Many studies have been done using opinion mining and sentiment analysis. However, the existing studies focus on the polarity classification from a single perspective (such as positive and negative). The classification of multiple affective attributes receives less attention. In this paper, 3-class classifications of four different affective attributes (i.e. Soft-Hard, Appealing-Unappealing, Handy-Bulky, and Reliable-Shoddy) are performed by using two classical machine learning algorithms (i.e. Softmax regression and Support Vector Machine) and two deep learning methods (i.e. Restricted Boltzmann machines and Deep Belief Network) on an Amazon dataset. The results show that the accuracy of deep learning methods is above 90%, while the accuracy of classical machine learning methods is about 64%. This indicates that deep learning methods are significantly better than classical machine learning methods.
The monitoring circuit is widely applied in radiation environment and it is of significance to study the circuit reliability with the radiation effects. In this paper, an intelligent analysis method based on Deep Belief Network (DBN) and Support Vector Method is proposed according to the radiation experiments analysis of the monitoring circuit. The Total Ionizing Dose (TID) of the monitoring circuit is used to identify the circuit degradation trend. Firstly, the output waveforms of the monitoring circuit are obtained by radiating with the different TID. Subsequently, the Deep Belief Network Model is trained to extract the features of the circuit signal. Finally, the Support Vector Machine (SVM) and Support Vector Regression (SVR) are applied to classify and predict the remaining useful life (RUL) of the monitoring circuit. According to the experimental results, the performance of DBN-SVM exceeds DBN method for feature extraction and classification, and SVR is effective for predicting the degradation.
It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.
Software Defined Network (SDN) architecture is a new and novel way of network management mechanism. In SDN, switches do not process the incoming packets like conventional network computing environment. They match for the incoming packets in the forwarding tables and if there is none it will be sent to the controller for processing which is the operating system of the SDN. A Distributed Denial of Service (DDoS) attack is a biggest threat to cyber security in SDN network. The attack will occur at the network layer or the application layer of the compromised systems that are connected to the network. In this paper a machine learning based intelligent method is proposed which can detect the incoming packets as infected or not. The different machine learning algorithms adopted for accomplishing the task are Naive Bayes, K-Nearest neighbor (KNN) and Support vector machine (SVM) to detect the anomalous behavior of the data traffic. These three algorithms are compared according to their performances and KNN is found to be the suitable one over other two. The performance measure is taken here is the detection rate of infected packets.
Two-factor authentication (2FA) popularly works by verifying something the user knows (a password) and something she possesses (a token, popularly instantiated with a smart phone). Conventional 2FA systems require extra interaction like typing a verification code, which is not very user-friendly. For improved user experience, recent work aims at zero-effort 2FA, in which a smart phone placed close to a computer (where the user enters her username/password into a browser to log into a server) automatically assists with the authentication. To prove her possession of the smart phone, the user needs to prove the phone is on the login spot, which reduces zero-effort 2FA to co-presence detection. In this paper, we propose SoundAuth, a secure zero-effort 2FA mechanism based on (two kinds of) ambient audio signals. SoundAuth looks for signs of proximity by having the browser and the smart phone compare both their surrounding sounds and certain unpredictable near-ultrasounds; if significant distinguishability is found, SoundAuth rejects the login request. For the ambient signals comparison, we regard it as a classification problem and employ a machine learning technique to analyze the audio signals. Experiments with real login attempts show that SoundAuth not only is comparable to existent schemes concerning utility, but also outperforms them in terms of resilience to attacks. SoundAuth can be easily deployed as it is readily supported by most smart phones and major browsers.
To prevent users' privacy from leakage, more and more mobile devices employ biometric-based authentication approaches, such as fingerprint, face recognition, voiceprint authentications, etc., to enhance the privacy protection. However, these approaches are vulnerable to replay attacks. Although state-of-art solutions utilize liveness verification to combat the attacks, existing approaches are sensitive to ambient environments, such as ambient lights and surrounding audible noises. Towards this end, we explore liveness verification of user authentication leveraging users' lip movements, which are robust to noisy environments. In this paper, we propose a lip reading-based user authentication system, LipPass, which extracts unique behavioral characteristics of users' speaking lips leveraging build-in audio devices on smartphones for user authentication. We first investigate Doppler profiles of acoustic signals caused by users' speaking lips, and find that there are unique lip movement patterns for different individuals. To characterize the lip movements, we propose a deep learning-based method to extract efficient features from Doppler profiles, and employ Support Vector Machine and Support Vector Domain Description to construct binary classifiers and spoofer detectors for user identification and spoofer detection, respectively. Afterwards, we develop a binary tree-based authentication approach to accurately identify each individual leveraging these binary classifiers and spoofer detectors with respect to registered users. Through extensive experiments involving 48 volunteers in four real environments, LipPass can achieve 90.21% accuracy in user identification and 93.1% accuracy in spoofer detection.
Industrial control systems (ICS) used in industrial plants are vulnerable to cyber-attacks that can cause fatal damage to the plants. Intrusion detection systems (IDSs) monitor ICS network traffic and detect suspicious activities. However, many IDSs overlook sophisticated cyber-attacks because it is hard to make a complete database of cyber-attacks and distinguish operational anomalies when compared to an established baseline. In this paper, a discriminant model between normal and anomalous packets was constructed with a support vector machine (SVM) based on an ICS communication profile, which represents only packet intervals and length, and an IDS with the applied model is proposed. Furthermore, the proposed IDS was evaluated using penetration tests on our cyber security test bed. Although the IDS was constructed by the limited features (intervals and length) of packets, the IDS successfully detected cyber-attacks by monitoring the rate of predicted attacking packets.
Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.