Biblio
The Internet of Things (IoT) is the network where physical devices, sensors, appliances and other different objects can communicate with each other without the need for human intervention. Wireless Sensor Networks (WSNs) are main building blocks of the IoT. Both the IoT and WSNs have many critical and non-critical applications that touch almost every aspect of our modern life. Unfortunately, these networks are prone to various types of security threats. Therefore, the security of IoT and WSNs became crucial. Furthermore, the resource limitations of the devices used in these networks complicate the problem. One of the most recent and effective approaches to address such challenges is machine learning. Machine learning inspires many solutions to secure the IoT and WSNs. In this paper, we survey the different threats that can attack both IoT and WSNs and the machine learning techniques developed to counter them.
While the rapid adaptation of mobile devices changes our daily life more conveniently, the threat derived from malware is also increased. There are lots of research to detect malware to protect mobile devices, but most of them adopt only signature-based malware detection method that can be easily bypassed by polymorphic and metamorphic malware. To detect malware and its variants, it is essential to adopt behavior-based detection for efficient malware classification. This paper presents a system that classifies malware by using common behavioral characteristics along with malware families. We measure the similarity between malware families with carefully chosen features commonly appeared in the same family. With the proposed similarity measure, we can classify malware by malware's attack behavior pattern and tactical characteristics. Also, we apply community detection algorithm to increase the modularity within each malware family network aggregation. To maintain high classification accuracy, we propose a process to derive the optimal weights of the selected features in the proposed similarity measure. During this process, we find out which features are significant for representing the similarity between malware samples. Finally, we provide an intuitive graph visualization of malware samples which is helpful to understand the distribution and likeness of the malware networks. In the experiment, the proposed system achieved 97% accuracy for malware classification and 95% accuracy for prediction by K-fold cross-validation using the real malware dataset.
Growing interest in eXplainable Artificial Intelligence (XAI) aims to make AI and machine learning more understandable to human users. However, most existing work focuses on new algorithms, and not on usability, practical interpretability and efficacy on real users. In this vision paper, we propose a new research area of eXplainable AI for Designers (XAID), specifically for game designers. By focusing on a specific user group, their needs and tasks, we propose a human-centered approach for facilitating game designers to co-create with AI/ML techniques through XAID. We illustrate our initial XAID framework through three use cases, which require an understanding both of the innate properties of the AI techniques and users' needs, and we identify key open challenges.
We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.
Software Defined Networking (SDN) and Network Function Virtualisation (NFV) are transforming modern networks towards a service-oriented architecture. At the same time, the cybersecurity industry is rapidly adopting Machine Learning (ML) algorithms to improve detection and mitigation of complex attacks. Traditional intrusion detection systems perform signature-based detection, based on well-known malicious traffic patterns that signify potential attacks. The main drawback of this method is that attack patterns need to be known in advance and signatures must be preconfigured. Hence, typical systems fail to detect a zero-day attack or an attack with unknown signature. This work considers the use of machine learning for advanced anomaly detection, and specifically deploys the Apache Spot ML framework on an SDN/NFV-enabled testbed running cybersecurity services as Virtual Network Functions (VNFs). VNFs are used to capture traffic for ingestion by the ML algorithm and apply mitigation measures in case of a detected anomaly. Apache Spot utilises Latent Dirichlet Allocation to identify anomalous traffic patterns in Netflow, DNS and proxy data. The overall performance of Apache Spot is evaluated by deploying Denial of Service (Slowloris, BoNeSi) and a Data Exfiltration attack (iodine).
The Machine Type Communication Devices (MTCDs) are usually based on Internet Protocol (IP), which can cause billions of connected objects to be part of the Internet. The enormous amount of data coming from these devices are quite heterogeneous in nature, which can lead to security issues, such as injection attacks, ballot stuffing, and bad mouthing. Consequently, this work considers machine learning trust evaluation as an effective and accurate option for solving the issues associate with security threats. In this paper, a comparative analysis is carried out with five different machine learning approaches: Naive Bayes (NB), Decision Tree (DT), Linear and Radial Support Vector Machine (SVM), KNearest Neighbor (KNN), and Random Forest (RF). As a critical element of the research, the recommendations consider different Machine-to-Machine (M2M) communication nodes with regard to their ability to identify malicious and honest information. To validate the performances of these models, two trust computation measures were used: Receiver Operating Characteristics (ROCs), Precision and Recall. The malicious data was formulated in Matlab. A scenario was created where 50% of the information were modified to be malicious. The malicious nodes were varied in the ranges of 10%, 20%, 30%, 40%, and the results were carefully analyzed.
Machine learning (ML) models are often trained using private datasets that are very expensive to collect, or highly sensitive, using large amounts of computing power. The models are commonly exposed either through online APIs, or used in hardware devices deployed in the field or given to the end users. This provides an incentive for adversaries to steal these ML models as a proxy for gathering datasets. While API-based model exfiltration has been studied before, the theft and protection of machine learning models on hardware devices have not been explored as of now. In this work, we examine this important aspect of the design and deployment of ML models. We illustrate how an attacker may acquire either the model or the model architecture through memory probing, side-channels, or crafted input attacks, and propose (1) power-efficient obfuscation as an alternative to encryption, and (2) timing side-channel countermeasures.
Malware or Malicious Software, are an important threat to information technology society. Deep Neural Network has been recently achieving a great performance for the tasks of malware detection and classification. In this paper, we propose a convolutional gated recurrent neural network model that is capable of classifying malware to their respective families. The model is applied to a set of malware divided into 9 different families and that have been proposed during the Microsoft Malware Classification Challenge in 2015. The model shows an accuracy of 92.6% on the available dataset.
In this paper, we propose a deep learning framework for malware classification. There has been a huge increase in the volume of malware in recent years which poses a serious security threat to financial institutions, businesses and individuals. In order to combat the proliferation of malware, new strategies are essential to quickly identify and classify malware samples so that their behavior can be analyzed. Machine learning approaches are becoming popular for classifying malware, however, most of the existing machine learning methods for malware classification use shallow learning algorithms (e.g. SVM). Recently, Convolutional Neural Networks (CNN), a deep learning approach, have shown superior performance compared to traditional learning algorithms, especially in tasks such as image classification. Motivated by this success, we propose a CNN-based architecture to classify malware samples. We convert malware binaries to grayscale images and subsequently train a CNN for classification. Experiments on two challenging malware classification datasets, Malimg and Microsoft malware, demonstrate that our method achieves better than the state-of-the-art performance. The proposed method achieves 98.52% and 99.97% accuracy on the Malimg and Microsoft datasets respectively.
Nowadays, network is one of the essential parts of life, and lots of primary activities are performed by using the network. Also, network security plays an important role in the administrator and monitors the operation of the system. The intrusion detection system (IDS) is a crucial module to detect and defend against the malicious traffics before the system is affected. This system can extract the information from the network system and quickly indicate the reaction which provides real-time protection for the protected system. However, detecting malicious traffics is very complicating because of their large quantity and variants. Also, the accuracy of detection and execution time are the challenges of some detection methods. In this paper, we propose an IDS platform based on convolutional neural network (CNN) called IDS-CNN to detect DoS attack. Experimental results show that our CNN based DoS detection obtains high accuracy at most 99.87%. Moreover, comparisons with other machine learning techniques including KNN, SVM, and Naïve Bayes demonstrate that our proposed method outperforms traditional ones.
In the light of the information revolution, and the propagation of big social data, the dissemination of misleading information is certainly difficult to control. This is due to the rapid and intensive flow of information through unconfirmed sources under the propaganda and tendentious rumors. This causes confusion, loss of trust between individuals and groups and even between governments and their citizens. This necessitates a consolidation of efforts to stop penetrating of false information through developing theoretical and practical methodologies aim to measure the credibility of users of these virtual platforms. This paper presents an approach to domain-based prediction to user's trustworthiness of Online Social Networks (OSNs). Through incorporating three machine learning algorithms, the experimental results verify the applicability of the proposed approach to classify and predict domain-based trustworthy users of OSNs.
Short Message Service is now-days the most used way of communication in the electronic world. While many researches exist on the email spam detection, we haven't had the insight knowledge about the spam done within the SMS's. This might be because the frequency of spam in these short messages is quite low than the emails. This paper presents different ways of analyzing spam for SMS and a new pre-processing way to get the actual dataset of spam messages. This dataset was then used on different algorithm techniques to find the best working algorithm in terms of both accuracy and recall. Random Forest algorithm was then implemented in a real world application library written in C\# for cross platform .Net development. This library is capable of using a prebuild model for classifying a new dataset for spam and ham.
Malicious traffic has garnered more attention in recent years, owing to the rapid growth of information technology in today's world. In 2007 alone, an estimated loss of 13 billion dollars was made from malware attacks. Malware data in today's context is massive. To understand such information using primitive methods would be a tedious task. In this publication we demonstrate some of the most advanced deep learning techniques available, multilayer perceptron (MLP) and J48 (also known as C4.5 or ID3) on our selected dataset, Advanced Security Network Metrics & Non-Payload-Based Obfuscations (ASNM-NPBO) to show that the answer to managing cyber security threats lie in the fore-mentioned methodologies.
At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.
Intellectual property (IP) and integrated circuit (IC) piracy are of increasing concern to IP/IC providers because of the globalization of IC design flow and supply chains. Such globalization is driven by the cost associated with the design, fabrication, and testing of integrated circuits and allows avenues for piracy. To protect the designs against IC piracy, we propose a fingerprinting scheme based on side-channel power analysis and machine learning methods. The proposed method distinguishes the ICs which realize a modified netlist, yet same functionality. Our method doesn't imply any hardware overhead. We specifically focus on the ability to detect minimal design variations, as quantified by the number of logic gates changed. Accuracy of the proposed scheme is greater than 96 percent, and typically 99 percent in detecting one or more gate-level netlist changes. Additionally, the effect of temperature has been investigated as part of this work. Results depict 95.4 percent accuracy in detecting the exact number of gate changes when data and classifier use the same temperature, while training with different temperatures results in 33.6 percent accuracy. This shows the effectiveness of building temperature-dependent classifiers from simulations at known operating temperatures.
At present, mobile terminals are widely used in power system and easy to be the target or springboard to attack the power system. It is necessary to have security assessment of power mobile terminal system to enable early warning of potential risks. In the context, this paper builds the security assessment system against to power mobile terminals, with features from security assessment system of general mobile terminals and power application scenarios. Compared with the existing methods, this paper introduces machine learning to the Rank Correlation Analysis method, which relies on expert experience, and uses objective experimental data to optimize the weight parameters of the indicators. From experiments, this paper proves that weights self-learning method can be used to evaluate the security of power mobile terminal system and improve credibility of the result.
Software insecurity is being identified as one of the leading causes of security breaches. In this paper, we revisited one of the strategies in solving software insecurity, which is the use of software quality metrics. We utilized a multilayer deep feedforward network in examining whether there is a combination of metrics that can predict the appearance of security-related bugs. We also applied the traditional machine learning algorithms such as decision tree, random forest, naïve bayes, and support vector machines and compared the results with that of the Deep Learning technique. The results have successfully demonstrated that it was possible to develop an effective predictive model to forecast software insecurity based on the software metrics and using Deep Learning. All the models generated have shown an accuracy of more than sixty percent with Deep Learning leading the list. This finding proved that utilizing Deep Learning methods and a combination of software metrics can be tapped to create a better forecasting model thereby aiding software developers in predicting security bugs.
Machine learning (ML) algorithms provide a good solution for many security sensitive applications, they themselves, however, face the threats of adversary attacks. As a key problem in machine learning, how to design robust feature selection algorithms against these attacks becomes a hot issue. The current researches on defending evasion attacks mainly focus on wrapped adversarial feature selection algorithm, i.e., WAFS, which is dependent on the classification algorithms, and time cost is very high for large-scale data. Since mRMR (minimum Redundancy and Maximum Relevance) algorithm is one of the most popular filter algorithms for feature selection without considering any classifier during feature selection process. In this paper, we propose a novel adversary-aware feature selection algorithm under filter model based on mRMR, named FAFS. The algorithm, on the one hand, takes the correlation between a single feature and a label, and the redundancy between features into account; on the other hand, when selecting features, it not only considers the generalization ability in the absence of attack, but also the robustness under attack. The performance of four algorithms, i.e., mRMR, TWFS (Traditional Wrapped Feature Selection algorithm), WAFS, and FAFS is evaluated on spam filtering and PDF malicious detection in the Perfect Knowledge attack scenarios. The experiment results show that FAFS has a better performance under evasion attacks with less time complexity, and comparable classification accuracy.
To date, numerous ways have been created to learn a fusion solution from data. However, a gap exists in terms of understanding the quality of what was learned and how trustworthy the fusion is for future-i.e., new-data. In part, the current paper is driven by the demand for so-called explainable AI (XAI). Herein, we discuss methods for XAI of the Choquet integral (ChI), a parametric nonlinear aggregation function. Specifically, we review existing indices, and we introduce new data-centric XAI tools. These various XAI-ChI methods are explored in the context of fusing a set of heterogeneous deep convolutional neural networks for remote sensing.
To add more functionality and enhance usability of web applications, JavaScript (JS) is frequently used. Even with many advantages and usefulness of JS, an annoying fact is that many recent cyberattacks such as drive-by-download attacks exploit vulnerability of JS codes. In general, malicious JS codes are not easy to detect, because they sneakily exploit vulnerabilities of browsers and plugin software, and attack visitors of a web site unknowingly. To protect users from such threads, the development of an accurate detection system for malicious JS is soliciting. Conventional approaches often employ signature and heuristic-based methods, which are prone to suffer from zero-day attacks, i.e., causing many false negatives and/or false positives. For this problem, this paper adopts a machine-learning approach to feature learning called Doc2Vec, which is a neural network model that can learn context information of texts. The extracted features are given to a classifier model (e.g., SVM and neural networks) and it judges the maliciousness of a JS code. In the performance evaluation, we use the D3M Dataset (Drive-by-Download Data by Marionette) for malicious JS codes and JSUPACK for benign ones for both training and test purposes. We then compare the performance to other feature learning methods. Our experimental results show that the proposed Doc2Vec features provide better accuracy and fast classification in malicious JS code detection compared to conventional approaches.
Access control for information has primarily focused on access statically granted to subjects by administrators usually in the context of a specific system. Even if mechanisms are available for access revocation, revocations must still be executed manually by an administrator. However, as physical devices become increasingly embedded and interconnected, access control needs to become an integral part of the resource being protected and be generated dynamically by resources depending on the context in which the resource is being used. In this paper, we discuss a set of scenarios for access control needed in current and future systems and use that to argue that an approach for resources to generate and manage their access control policies dynamically on their own is needed. We discuss some approaches for generating such access control policies that may address the requirements of the scenarios.