Biblio
The increased reliance on the Internet and the corresponding surge in connectivity demand has led to a significant growth in Internet-of-Things (IoT) devices. The continued deployment of IoT devices has in turn led to an increase in network attacks due to the larger number of potential attack surfaces as illustrated by the recent reports that IoT malware attacks increased by 215.7% from 10.3 million in 2017 to 32.7 million in 2018. This illustrates the increased vulnerability and susceptibility of IoT devices and networks. Therefore, there is a need for proper effective and efficient attack detection and mitigation techniques in such environments. Machine learning (ML) has emerged as one potential solution due to the abundance of data generated and available for IoT devices and networks. Hence, they have significant potential to be adopted for intrusion detection for IoT environments. To that end, this paper proposes an optimized ML-based framework consisting of a combination of Bayesian optimization Gaussian Process (BO-GP) algorithm and decision tree (DT) classification model to detect attacks on IoT devices in an effective and efficient manner. The performance of the proposed framework is evaluated using the Bot-IoT-2018 dataset. Experimental results show that the proposed optimized framework has a high detection accuracy, precision, recall, and F-score, highlighting its effectiveness and robustness for the detection of botnet attacks in IoT environments.
We consider the problem of protecting cloud services from simultaneous white-box and black-box attacks. Recent research in cryptographic program obfuscation considers the problem of protecting the confidentiality of programs and any secrets in them. In this model, a provable program obfuscation solution makes white-box attacks to the program not more useful than black-box attacks. Motivated by very recent results showing successful black-box attacks to machine learning programs run by cloud servers, we propose and study the approach of augmenting the program obfuscation solution model so to achieve, in at least some class of application scenarios, program confidentiality in the presence of both white-box and black-box attacks.We propose and formally define encrypted-input program obfuscation, where a key is shared between the entity obfuscating the program and the entity encrypting the program's inputs. We believe this model might be of interest in practical scenarios where cloud programs operate over encrypted data received by associated sensors (e.g., Internet of Things, Smart Grid).Under standard intractability assumptions, we show various results that are not known in the traditional cryptographic program obfuscation model; most notably: Yao's garbled circuit technique implies encrypted-input program obfuscation hiding all gates of an arbitrary polynomial circuit; and very efficient encrypted-input program obfuscation for range membership programs and a class of machine learning programs (i.e., decision trees). The performance of the latter solutions has only a small constant overhead over the equivalent unobfuscated program.
The anonymity and decentralization of Bitcoin make it widely accepted in illegal transactions, such as money laundering, drug and weapon trafficking, gambling, to name a few, which has already caused significant security risk all around the world. The obvious de-anonymity approach that matches transaction addresses and users is not possible in practice due to limited annotated data set. In this paper, we divide addresses into four types, exchange, gambling, service, and general, and propose targeted addresses identification algorithms with high fault tolerance which may be employed in a wide range of applications. We use network representation learning to extract features and train imbalanced multi-classifiers. Experimental results validated the effectiveness of the proposed method.
Finite-state machine (FSM) is widely used as control unit in most digital designs. Many intellectual property protection and obfuscation techniques leverage on the exponential number of possible states and state transitions of large FSM to secure a physical design with the reason that it is challenging to retrieve the FSM design from its downstream design or physical implementation without knowledge of the design. In this paper, we postulate that this assumption may not be sustainable with big data analytics. We demonstrate by applying a data mining technique to analyze sufficiently large amount of data collected from a full scan design to identify its FSM state registers. An impact metric is introduced to discriminate FSM state registers from other registers. A decision tree algorithm is constructed from the scan data for the regression analysis of the dependency of other registers on a chosen register to deduce its impact. The registers with the greater impact are more likely to be the FSM state registers. The proposed scheme is applied on several complex designs from OpenCores. The experiment results show the feasibility of our scheme in correctly identifying most FSM state registers with a high hit rate for a large majority of the designs.
Machine learning is a major area in artificial intelligence, which enables computer to learn itself explicitly without programming. As machine learning is widely used in making decision automatically, attackers have strong intention to manipulate the prediction generated my machine learning model. In this paper we study about the different types of attacks and its countermeasures on machine learning model. By research we found that there are many security threats in various algorithms such as K-nearest-neighbors (KNN) classifier, random forest, AdaBoost, support vector machine (SVM), decision tree, we revisit existing security threads and check what are the possible countermeasures during the training and prediction phase of machine learning model. In machine learning model there are 2 types of attacks that is causative attack which occurs during the training phase and exploratory attack which occurs during the prediction phase, we will also discuss about the countermeasures on machine learning model, the countermeasures are data sanitization, algorithm robustness enhancement, and privacy preserving techniques.
Proper evaluation of classifier predictive models requires the selection of appropriate metrics to gauge the effectiveness of a model's performance. The Area Under the Receiver Operating Characteristic Curve (AUC) has become the de facto standard metric for evaluating this classifier performance. However, recent studies have suggested that AUC is not necessarily the best metric for all types of datasets, especially those in which there exists a high or severe level of class imbalance. There is a need to assess which specific metrics are most beneficial to evaluate the performance of highly imbalanced big data. In this work, we evaluate the performance of eight machine learning techniques on a severely imbalanced big dataset pertaining to the cyber security domain. We analyze the behavior of six different metrics to determine which provides the best representation of a model's predictive performance. We also evaluate the impact that adjusting the classification threshold has on our metrics. Our results find that the C4.5N decision tree is the optimal learner when evaluating all presented metrics for severely imbalanced Slow HTTP DoS attack data. Based on our results, we propose that the use of AUC alone as a primary metric for evaluating highly imbalanced big data may be ineffective, and the evaluation of metrics such as F-measure and Geometric mean can offer substantial insight into the true performance of a given model.
Malware is one of the threats to information security that continues to increase. In 2014 nearly six million new malware was recorded. The highest number of malware is in Trojan Horse malware while in Adware malware is the most significantly increased malware. Security system devices such as antivirus, firewall, and IDS signature-based are considered to fail to detect malware. This happens because of the very fast spread of computer malware and the increasing number of signatures. Besides signature-based security systems it is difficult to identify new methods, viruses or worms used by attackers. One other alternative in detecting malware is to use honeypot with machine learning. Honeypot can be used as a trap for packages that are suspected while machine learning can detect malware by classifying classes. Decision Tree and Support Vector Machine (SVM) are used as classification algorithms. In this paper, we propose architectural design as a solution to detect malware. We presented the architectural proposal and explained the experimental method to be used.
Widespread use of Wireless Sensor Networks (WSNs) introduced many security threats due to the nature of such networks, particularly limited hardware resources and infrastructure less nature. Denial of Service attack is one of the most common types of attacks that face such type of networks. Building an Intrusion Detection and Prevention System to mitigate the effect of Denial of Service attack is not an easy task. This paper proposes the use of two machine learning techniques, namely decision trees and Support Vector Machines, to detect attack signature on a specialized dataset. The used dataset contains regular profiles and several Denial of Service attack scenarios in WSNs. The experimental results show that decision trees technique achieved better (higher) true positive rate and better (lower) false positive rate than Support Vector Machines, 99.86% vs 99.62%, and 0.05% vs. 0.09%, respectively.
Comment spam is one of the great challenges faced by forum administrators. Detecting and blocking comment spam can relieve the load on servers, improve user experience and purify the network conditions. This paper focuses on the detection of comment spam. The behaviors of spammer and the content of spam were analyzed. According to analysis results, two types of effective features are extracted which can make a better description of spammer characteristics. Additionally, a gradient boosting tree algorithm was used to construct the comment spam detector based on the extracted features. Our proposed method is examined on a blog spam dataset which was published by previous research, and the result illustrates that our method performs better than the previous method on detection accuracy. Moreover, the CPU time is recorded to demonstrate that the time spent on both training and testing maintains a small value.
The current authentication systems based on password and pin code are not enough to guarantee attacks from malicious users. For this reason, in the last years, several studies are proposed with the aim to identify the users basing on their typing dynamics. In this paper, we propose a deep neural network architecture aimed to discriminate between different users using a set of keystroke features. The idea behind the proposed method is to identify the users silently and continuously during their typing on a monitored system. To perform such user identification effectively, we propose a feature model able to capture the typing style that is specific to each given user. The proposed approach is evaluated on a large dataset derived by integrating two real-world datasets from existing studies. The merged dataset contains a total of 1530 different users each writing a set of different typing samples. Several deep neural networks, with an increasing number of hidden layers and two different sets of features, are tested with the aim to find the best configuration. The final best classifier scores a precision equal to 0.997, a recall equal to 0.99 and an accuracy equal to 99% using an MLP deep neural network with 9 hidden layers. Finally, the performances obtained by using the deep learning approach are also compared with the performance of traditional decision-trees machine learning algorithm, attesting the effectiveness of the deep learning-based classifiers in the domain of keystroke analysis.
The computer network is used by billions of people worldwide for variety of purposes. This has made the security increasingly important in networks. It is essential to use Intrusion Detection Systems (IDS) and devices whose main function is to detect anomalies in networks. Mostly all the intrusion detection approaches focuses on the issues of boosting techniques since results are inaccurate and results in lengthy detection process. The major pitfall in network based intrusion detection is the wide-ranging volume of data gathered from the network. In this paper, we put forward a hybrid anomaly based intrusion detection system which uses Classification and Boosting technique. The Paper is organized in such a way it compares the performance three different Classifiers along with boosting. Boosting process maximizes classification accuracy. Results of proposed scheme will analyzed over different datasets like Intrusion Detection Kaggle Dataset and NSL KDD. Out of vast analysis it is found Random tree provides best average Accuracy rate of around 99.98%, Detection rate of 98.79% and a minimum False Alarm rate.
Since the term “Fog Computing” has been coined by Cisco Systems in 2012, security and privacy issues of this promising paradigm are still open challenges. Among various security challenges, Access Control is a crucial concern for all cloud computing-like systems (e.g. Fog computing, Mobile edge computing) in the IoT era. Therefore, assigning the precise level of access in such an inherently scalable, heterogeneous and dynamic environment is not easy to perform. This work defines the uncertainty challenge for authentication phase of the access control in fog computing because on one hand fog has a number of characteristics that amplify uncertainty in authentication and on the other hand applying traditional access control models does not result in a flexible and resilient solution. Therefore, we have proposed a novel prediction model based on the extension of Attribute Based Access Control (ABAC) model. Our data-driven model is able to handle uncertainty in authentication. It is also able to consider the mobility of mobile edge devices in order to handle authentication. In doing so, we have built our model using and comparing four supervised classification algorithms namely as Decision Tree, Naïve Bayes, Logistic Regression and Support Vector Machine. Our model can achieve authentication performance with 88.14% accuracy using Logistic Regression.