Biblio
We survey the state-of-the-art on the Internet-of-Things (IoT) from a wireless communications point of view, as a result of the European FP7 project BUTLER which has its focus on pervasiveness, context-awareness and security for IoT. In particular, we describe the efforts to develop so-called (wireless) enabling technologies, aimed at circumventing the many challenges involved in extending the current set of domains (“verticals”) of IoT applications towards a “horizontal” (i.e. integrated) vision of the IoT. We start by illustrating current research effort in machine-to-machine (M2M), which is mainly focused on vertical domains, and we discuss some of them in details, depicting then the necessary horizontal vision for the future intelligent daily routine (“Smart Life”). We then describe the technical features of the most relevant heterogeneous communications technologies on which the IoT relies, under the light of the on-going M2M service layer standardization. Finally we identify and present the key aspects, within three major cross-vertical categories, under which M2M technologies can function as enablers for the horizontal vision of the IoT.
In the last decade, the number of available cores increased and heterogeneity grew. In this work, we ask the question whether the design of the current operating systems (OSes) is still appropriate if these trends continue and lead to abundantly available but heterogeneous cores, or whether it forces a fundamental rethinking of how systems are designed. We argue that: 1. hiding heterogeneity behind a common hardware interface unifies, to a large extent, the control and coordination of cores and accelerators in the OS, 2. isolating at the network-on-chip rather than with processor features (like privileged mode, memory management unit, ...), allows running untrusted code on arbitrary cores, and 3. providing OS services via protocols over the network-on-chip, instead of via system calls, makes them accessible to arbitrary types of cores as well. In summary, this turns accelerators into first-class citizens and enables a single and convenient programming environment for all cores without the need to trust any application. In this paper, we introduce network-on-chip-level isolation, present the design of our microkernel-based OS, M3, and the common hardware interface, and evaluate the performance of our prototype in comparison to Linux. A bit surprising, without using accelerators, M3 outperforms Linux in some application-level benchmarks by more than a factor of five.
This paper introduces a new efficient algorithm for computing Grobner-bases named M4GB. Like Faugere's algorithm F4 it is an extension of Buchberger's algorithm that describes: how to store already computed (tail-)reduced multiples of basis polynomials to prevent redundant work in the reduction step; and how to exploit efficient linear algebra for the reduction step. In comparison to F4 it removes further redundant work in the processing of reducible monomials. Furthermore, instead of translating the reduction of many critical pairs into the row reduction of some large matrix, our algorithm is described more natively and is efficient while processing critical pairs one by one. This feature implies that typically M4GB has to process fewer critical pairs than F4, and reduces the time and data complexity 'staircase' related to the increasing degree of regularity for a sequence of problems one observes for F4. To achieve high efficiency, M4GB has been designed specifically to operate only on tail-reduced polynomials, i.e., polynomials of which all terms except the leading term are non-reducible. This allows it to perform full-reduction directly in the computation of a term polynomial multiplication, where all computations are done over coefficient vectors over the non-reducible monomials. We have implemented a version of our new algorithm tailored for dense overdefined polynomial systems as a proof of concept and made our source code publicly available. We have made a comparison of our implementation against the implementations of FGBlib, Magma and OpenF4 on various dense Fukuoka MQ challenge problems that we were able to compute in reasonable time and memory. We observed that M4GB uses the least total CPU time and the least memory of all these implementations for those MQ problems, often by a significant factor. In the Fukuoka MQ challenges, the starting challenges of Type V and Type VI have 16 equations which was chosen based on an extrapolated computational runtime of more than a month using Magma. M4GB allowed us to set new records for these Fukuoka MQ challenges breaking Type V (F28) up to 18 equations and Type VI (F31) up to 19 equations, each can be computed within up to 11 days on our dual Xeon system.
Intrusion detection is one of the most prominent and challenging problem faced by cybersecurity organizations. Intrusion Detection System (IDS) plays a vital role in identifying network security threats. It protects the network for vulnerable source code, viruses, worms and unauthorized intruders for many intranet/internet applications. Despite many open source APIs and tools for intrusion detection, there are still many network security problems exist. These problems are handled through the proper pre-processing, normalization, feature selection and ranking on benchmark dataset attributes prior to the enforcement of self-learning-based classification algorithms. In this paper, we have performed a comprehensive comparative analysis of the benchmark datasets NSL-KDD and CIDDS-001. For getting optimal results, we have used the hybrid feature selection and ranking methods before applying self-learning (Machine / Deep Learning) classification algorithmic approaches such as SVM, Naïve Bayes, k-NN, Neural Networks, DNN and DAE. We have analyzed the performance of IDS through some prominent performance indicator metrics such as Accuracy, Precision, Recall and F1-Score. The experimental results show that k-NN, SVM, NN and DNN classifiers perform approx. 100% accuracy regarding performance evaluation metrics on the NSL-KDD dataset whereas k-NN and Naïve Bayes classifiers perform approx. 99% accuracy on the CIDDS-001 dataset.
Software Defined Networks (SDNs) have gained prominence recently due to their flexible management and superior configuration functionality of the underlying network. SDNs, with OpenFlow as their primary implementation, allow for the use of a centralised controller to drive the decision making for all the supported devices in the network and manage traffic through routing table changes for incoming flows. In conventional networks, machine learning has been shown to detect malicious intrusion, and classify attacks such as DoS, user to root, and probe attacks. In this work, we extend the use of machine learning to improve traffic tolerance for SDNs. To achieve this, we extend the functionality of the controller to include a resilience framework, ReSDN, that incorporates machine learning to be able to distinguish DoS attacks, focussing on a neptune attack for our experiments. Our model is trained using the MIT KDD 1999 dataset. The system is developed as a module on top of the POX controller platform and evaluated using the Mininet simulator.
Detecting malicious code with exact match on collected datasets is becoming a large-scale identification problem due to the existence of new malware variants. Being able to promptly and accurately identify new attacks enables security experts to respond effectively. My proposal is to develop an automated framework for identification of unknown vulnerabilities by leveraging current neural network techniques. This has a significant and immediate value for the security field, as current anti-virus software is typically able to recognize the malware type only after its infection, and preventive measures are limited. Artificial Intelligence plays a major role in automatic malware classification: numerous machine-learning methods, both supervised and unsupervised, have been researched to try classifying malware into families based on features acquired by static and dynamic analysis. The value of automated identification is clear, as feature engineering is both a time-consuming and time-sensitive task, with new malware studied while being observed in the wild.
One of the challenges in supplying the communities with wider access to scientific databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it might not be practical to make the public aware of the structure of databases. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide a more intuitive method for generating database queries and delivering responses. Through social media, which makes it possible to interact with a wide section of the population, and with the help of natural language processing, researchers at the Atmospheric Radiation Measurement (ARM) Data Center at Oak Ridge National Laboratory (ORNL) have developed a concept to enable easy search and retrieval of data from several environmental data centers for the scientific community through social media.Using a machine learning framework that maps natural language text to thousands of datasets, instruments, variables, and data streams, the prototype system would allow users to request data through Twitter and receive a link (via tweet) to applicable data results on the project's search catalog tailored to their key words. This automated identification of relevant data from various petascale archives at ORNL could increase convenience, access, and use of the project's data by the broader community. In this paper we discuss how some data-intensive projects at ORNL are using innovative ways to help in data discovery.
Nowadays, sentiment analysis methods become more and more popular especially with the proliferation of social media platform users number. In the same context, this paper presents a sentiment analysis approach which can faithfully translate the sentimental orientation of Arabic Twitter posts, based on a novel data representation and machine learning techniques. The proposed approach applied a wide range of features: lexical, surface-form, syntactic, etc. We also made use of lexicon features inferred from two Arabic sentiment words lexicons. To build our supervised sentiment analysis system, we use several standard classification methods (Support Vector Machines, K-Nearest Neighbour, Naïve Bayes, Decision Trees, Random Forest) known by their effectiveness over such classification issues. In our study, Support Vector Machines classifier outperforms other supervised algorithms in Arabic Twitter sentiment analysis. Via an ablation experiments, we show the positive impact of lexicon based features on providing higher prediction performance.
Phishing is the major problem of the internet era. In this era of internet the security of our data in web is gaining an increasing importance. Phishing is one of the most harmful ways to unknowingly access the credential information like username, password or account number from the users. Users are not aware of this type of attack and later they will also become a part of the phishing attacks. It may be the losses of financial found, personal information, reputation of brand name or trust of brand. So the detection of phishing site is necessary. In this paper we design a framework of phishing detection using URL.
We introduce a machine learning approach for distinguishing between integrated circuits fabricated in a ratified facility and circuits originating from an unknown or undesired source based on parametric measurements. Unlike earlier approaches, which seek to achieve the same objective in a general, design-independent manner, the proposed method leverages the interaction between the idiosyncrasies of the fabrication facility and a specific design, in order to create a customized fab-of-origin membership test for the circuit in question. Effectiveness of the proposed method is demonstrated using two large industrial datasets from a 65nm Texas Instruments RF transceiver manufactured in two different fabrication facilities.
Live migration is one of the key technologies to improve data center utilization, power efficiency, and maintenance. Various live migration algorithms have been proposed; each exhibiting distinct characteristics in terms of completion time, amount of data transferred, virtual machine (VM) downtime, and VM performance degradation. To make matters worse, not only the migration algorithm but also the applications running inside the migrated VM affect the different performance metrics. With service-level agreements and operational constraints in place, choosing the optimal live migration technique has so far been an open question. In this work, we propose an adaptive machine learning-based model that is able to predict with high accuracy the key characteristics of live migration in dependence of the migration algorithm and the workload running inside the VM. We discuss the important input parameters for accurately modeling the target metrics, and describe how to profile them with little overhead. Compared to existing work, we are not only able to model all commonly used migration algorithms but also predict important metrics that have not been considered so far such as the performance degradation of the VM. In a comparison with the state-of-the-art, we show that the proposed model outperforms existing work by a factor 2 to 5.
To add more functionality and enhance usability of web applications, JavaScript (JS) is frequently used. Even with many advantages and usefulness of JS, an annoying fact is that many recent cyberattacks such as drive-by-download attacks exploit vulnerability of JS codes. In general, malicious JS codes are not easy to detect, because they sneakily exploit vulnerabilities of browsers and plugin software, and attack visitors of a web site unknowingly. To protect users from such threads, the development of an accurate detection system for malicious JS is soliciting. Conventional approaches often employ signature and heuristic-based methods, which are prone to suffer from zero-day attacks, i.e., causing many false negatives and/or false positives. For this problem, this paper adopts a machine-learning approach to feature learning called Doc2Vec, which is a neural network model that can learn context information of texts. The extracted features are given to a classifier model (e.g., SVM and neural networks) and it judges the maliciousness of a JS code. In the performance evaluation, we use the D3M Dataset (Drive-by-Download Data by Marionette) for malicious JS codes and JSUPACK for benign ones for both training and test purposes. We then compare the performance to other feature learning methods. Our experimental results show that the proposed Doc2Vec features provide better accuracy and fast classification in malicious JS code detection compared to conventional approaches.
Silicon Physical Unclonable Function (PUF) is arguably the most promising hardware security primitive. In particular, PUFs that are capable of generating a large amount of challenge response pairs (CRPs) can be used in many security applications. However, these CRPs can also be exploited by machine learning attacks to model the PUF and predict its response. In this paper, we first show that, based on data in the public domain, two popular PUFs that can generate CRPs (i.e., arbiter PUF and reconfigurable ring oscillator (RO) PUF) can be broken by simple logistic regression (LR) attack with about 99% accuracy. We then propose a feedback structure to XOR the PUF response with the challenge and challenge the PUF again to generate the response. Results show that this successfully reduces LR's learning accuracy to the lower 50%, but artificial neural network (ANN) learning attack still has an 80% success rate. Therefore, we propose a configurable ring oscillator based dual-mode PUF which works with both odd number of inverters (like the reconfigurable RO PUF) and even number of inverters (like a bistable ring (BR) PUF). Since currently there are no known attacks that can model both RO PUF and BR PUF, the dual-mode PUF will be resistant to modeling attacks as long as we can hide its working mode from the attackers, which we achieve with two practical methods. Finally, we implement the proposed dual-mode PUF on Nexys 4 FPGA boards and collect real measurement to show that it reduces the learning accuracy of LR and ANN to the mid-50% and low 60%, respectively. In addition, it meets the PUF requirements of uniqueness, randomness, and robustness.
Physical Unclonable Functions (PUFs) have been designed for many security applications such as identification, authentication of devices and key generation, especially for lightweight electronics. Traditional approaches to enhancing security, such as hash functions, may be expensive and resource dependent. However, modelling attacks using machine learning (ML) show the vulnerability of most PUFs. In this paper, a combination of a 32-bit current mirror and 16-bit arbiter PUFs in 65nm CMOS technology is proposed to improve resilience against modelling attacks. Both PUFs are vulnerable to machine learning attacks and we reduce the output prediction rate from 99.2% and 98.8% individually, to 60%.
This paper presents a machine learning classifier designed to identify SQL injection vulnerabilities in PHP code. Both classical and deep learning based machine learning algorithms were used to train and evaluate classifier models using input validation and sanitization features extracted from source code files. On ten-fold cross validations a model trained using Convolutional Neural Network(CNN) achieved the highest precision (95.4%), while a model based on Multilayer Perceptron(MLP) achieved the highest recall (63.7%) and the highest f-measure (0.746).