Biblio
Unmanned systems are increasing in number, while their manning requirements remain the same. To decrease manpower demands, machine learning techniques and autonomy are gaining traction and visibility. One barrier is human perception and understanding of autonomy. Machine learning techniques can result in “black box” algorithms that may yield high fitness, but poor comprehension by operators. However, Interactive Machine Learning (IML), a method to incorporate human input over the course of algorithm development by using neuro-evolutionary machine-learning techniques, may offer a solution. IML is evaluated here for its impact on developing autonomous team behaviors in an area search task. Initial findings show that IML-generated search plans were chosen over plans generated using a non-interactive ML technique, even though the participants trusted them slightly less. Further, participants discriminated each of the two types of plans from each other with a high degree of accuracy, suggesting the IML approach imparts behavioral characteristics into algorithms, making them more recognizable. Together the results lay the foundation for exploring how to team humans successfully with ML behavior.
Deep learning techniques have demonstrated the ability to perform a variety of object recognition tasks using visible imager data; however, deep learning has not been implemented as a means to autonomously detect and assess targets of interest in a physical security system. We demonstrate the use of transfer learning on a convolutional neural network (CNN) to significantly reduce training time while keeping detection accuracy of physical security relevant targets high. Unlike many detection algorithms employed by video analytics within physical security systems, this method does not rely on temporal data to construct a background scene; targets of interest can halt motion indefinitely and still be detected by the implemented CNN. A key advantage of using deep learning is the ability for a network to improve over time. Periodic retraining can lead to better detection and higher confidence rates. We investigate training data size versus CNN test accuracy using physical security video data. Due to the large number of visible imagers, significant volume of data collected daily, and currently deployed human in the loop ground truth data, physical security systems present a unique environment that is well suited for analysis via CNNs. This could lead to the creation of algorithmic element that reduces human burden and decreases human analyzed nuisance alarms.
New and unseen network attacks pose a great threat to the signature-based detection systems. Consequently, machine learning-based approaches are designed to detect attacks, which rely on features extracted from network data. The problem is caused by different distribution of features in the training and testing datasets, which affects the performance of the learned models. Moreover, generating labeled datasets is very time-consuming and expensive, which undercuts the effectiveness of supervised learning approaches. In this paper, we propose using transfer learning to detect previously unseen attacks. The main idea is to learn the optimized representation to be invariant to the changes of attack behaviors from labeled training sets and non-labeled testing sets, which contain different types of attacks and feed the representation to a supervised classifier. To the best of our knowledge, this is the first effort to use a feature-based transfer learning technique to detect unseen variants of network attacks. Furthermore, this technique can be used with any common base classifier. We evaluated the technique on publicly available datasets, and the results demonstrate the effectiveness of transfer learning to detect new network attacks.
Linking the growing IPv6 deployment to existing IPv4 addresses is an interesting field of research, be it for network forensics, structural analysis, or reconnaissance. In this work, we focus on classifying pairs of server IPv6 and IPv4 addresses as siblings, i.e., running on the same machine. Our methodology leverages active measurements of TCP timestamps and other network characteristics, which we measure against a diverse ground truth of 682 hosts. We define and extract a set of features, including estimation of variable (opposed to constant) remote clock skew. On these features, we train a manually crafted algorithm as well as a machine-learned decision tree. By conducting several measurement runs and training in cross-validation rounds, we aim to create models that generalize well and do not overfit our training data. We find both models to exceed 99% precision in train and test performance. We validate scalability by classifying 149k siblings in a large-scale measurement of 371k sibling candidates. We argue that this methodology, thoroughly cross-validated and likely to generalize well, can aid comparative studies of IPv6 and IPv4 behavior in the Internet. Striving for applicability and replicability, we release ready-to-use source code and raw data from our study.
Vehicular ad hoc networks (VANETs) are taking more attention from both the academia and the automotive industry due to a rapid development of wireless communication technologies. And with this development, vehicles called connected cars are increasingly being equipped with more sensors, processors, storages, and communication devices as they start to provide both infotainment and safety services through V2X communication. Such increase of vehicles is also related to the rise of security attacks and potential security threats. In a vehicular environment, security is one of the most important issues and it must be addressed before VANETs can be widely deployed. Conventional VANETs have some unique characteristics such as high mobility, dynamic topology, and a short connection time. Since an attacker can launch any unexpected attacks, it is difficult to predict these attacks in advance. To handle this problem, we propose collaborative security attack detection mechanism in a software-defined vehicular networks that uses multi-class support vector machine (SVM) to detect various types of attacks dynamically. We compare our security mechanism to existing distributed approach and present simulation results. The results demonstrate that the proposed security mechanism can effectively identify the types of attacks and achieve a good performance regarding high precision, recall, and accuracy.
As smart grid systems become increasingly reliant on networks of control devices, attacks on their inherent security vulnerabilities could lead to catastrophic system failures. Network Intrusion Detection Systems(NIDS) detect such attacks by learning traffic patterns and finding anomalies in them. However, availability of data for robust training and evaluation of NIDS is rare due to associated operational and security risks of sharing such data. Consequently, we present Melody, a scalable framework for synthesizing such datasets. Melody models both, the cyber and physical components of the smart grid by integrating a simulated physical network with an emulated cyber network while using virtual time for high temporal fidelity. We present a systematic approach to generate traffic representing multi-stage attacks, where each stage is either emulated or recreated with a mechanism to replay arbitrary packet traces. We describe and evaluate the suitability of Melodys datasets for intrusion detection, by analyzing the extent to which temporal accuracy of pertinent features is maintained.
For the increasing use of internet, it is equally important to protect the intellectual property. And for the protection of copyright, a blind digital watermark algorithm with SVD and OSELM in the IWT domain has been proposed. During the embedding process, SVD has been applied to the coefficient blocks to get the singular values in the IWT domain. Singular values are modulated to embed the watermark in the host image. Online sequential extreme learning machine is trained to learn the relationship between the original coefficient and the corresponding watermarked version. During the extraction process, this trained OSELM is used to extract the embedded watermark logo blindly as no original host image is required during this process. The watermarked image is altered using various attacks like blurring, noise, sharpening, rotation and cropping. The experimental results show that the proposed watermarking scheme is robust against various attacks. The extracted watermark has very much similarity with the original watermark and works good to prove the ownership.
Smartphones have become the pervasive personal computing platform. Recent years thus have witnessed exponential growth in research and development for secure and usable authentication schemes for smartphones. Several explicit (e.g., PIN-based) and/or implicit (e.g., biometrics-based) authentication methods have been designed and published in the literature. In fact, some of them have been embedded in commercial mobile products as well. However, the published studies report only the brighter side of the proposed scheme(s), e.g., higher accuracy attained by the proposed mechanism. While other associated operational issues, such as computational overhead, robustness to different environmental conditions/attacks, usability, are intentionally or unintentionally ignored. More specifically, most publicly available frameworks did not discuss or explore any other evaluation criterion, usability and environment-related measures except the accuracy under zero-effort. Thus, their baseline operations usually give a false sense of progress. This paper, therefore, presents some guidelines to researchers for designing, implementation, and evaluating smartphone user authentication methods for a positive impact on future technological developments.
A technique and algorithms for early detection of the started attack and subsequent blocking of malicious traffic are proposed. The primary separation of mixed traffic into trustworthy and malicious traffic was carried out using cluster analysis. Classification of newly arrived requests was done using different classifiers with the help of received training samples and developed success criteria.
Data analytics is being increasingly used in cyber-security problems, and found to be useful in cases where data volumes and heterogeneity make it cumbersome for manual assessment by security experts. In practical cyber-security scenarios involving data-driven analytics, obtaining data with annotations (i.e. ground-truth labels) is a challenging and known limiting factor for many supervised security analytics task. Significant portions of the large datasets typically remain unlabelled, as the task of annotation is extensively manual and requires a huge amount of expert intervention. In this paper, we propose an effective active learning approach that can efficiently address this limitation in a practical cyber-security problem of Phishing categorization, whereby we use a human-machine collaborative approach to design a semi-supervised solution. An initial classifier is learnt on a small amount of the annotated data which in an iterative manner, is then gradually updated by shortlisting only relevant samples from the large pool of unlabelled data that are most likely to influence the classifier performance fast. Prioritized Active Learning shows a significant promise to achieve faster convergence in terms of the classification performance in a batch learning framework, and thus requiring even lesser effort for human annotation. An useful feature weight update technique combined with active learning shows promising classification performance for categorizing Phishing/malicious URLs without requiring a large amount of annotated training samples to be available during training. In experiments with several collections of PhishMonger's Targeted Brand dataset, the proposed method shows significant improvement over the baseline by as much as 12%.
Several defect prediction models proposed are effective when historical datasets are available. Defect prediction becomes difficult when no historical data exist. Cross-project defect prediction (CPDP), which uses projects from other sources/companies to predict the defects in the target projects proposed in recent studies has shown promising results. However, the performance of most CPDP approaches are still beyond satisfactory mainly due to distribution mismatch between the source and target projects. In this study, a credibility theory based Naïve Bayes (CNB) classifier is proposed to establish a novel reweighting mechanism between the source projects and target projects so that the source data could simultaneously adapt to the target data distribution and retain its own pattern. Our experimental results show that the feasibility of the novel algorithm design and demonstrate the significant improvement in terms of the performance metrics considered achieved by CNB over other CPDP approaches.
Software Defined Networks (SDNs) have gained prominence recently due to their flexible management and superior configuration functionality of the underlying network. SDNs, with OpenFlow as their primary implementation, allow for the use of a centralised controller to drive the decision making for all the supported devices in the network and manage traffic through routing table changes for incoming flows. In conventional networks, machine learning has been shown to detect malicious intrusion, and classify attacks such as DoS, user to root, and probe attacks. In this work, we extend the use of machine learning to improve traffic tolerance for SDNs. To achieve this, we extend the functionality of the controller to include a resilience framework, ReSDN, that incorporates machine learning to be able to distinguish DoS attacks, focussing on a neptune attack for our experiments. Our model is trained using the MIT KDD 1999 dataset. The system is developed as a module on top of the POX controller platform and evaluated using the Mininet simulator.
Security issues in the IoT based CPS are exacerbated with human participation in CPHS due to the vulnerabilities in both the technologies and the human involvement. A holistic framework to mitigate security threats in the IoT-based CPHS environment is presented to mitigate these issues. We have developed threat model involving human elements in the CPHS environment. Research questions, directions, and ideas with respect to securing IoT based CPHS against collaborative attacks are presented.
Cloud computation has become prominent with seemingly unlimited amount of storage and computation available to users. Yet, security is a major issue that hampers the growth of cloud. In this research we investigate a collaborative Intrusion Detection System (IDS) based on the ensemble learning method. It uses weak classifiers, and allows the use of untapped resources of cloud to detect various types of attacks on the cloud system. In the proposed system, tasks are distributed among available virtual machines (VM), individual results are then merged for the final adaptation of the learning model. Performance evaluation is carried out using decision trees and using fuzzy classifiers, on KDD99, one of the largest datasets for IDS. Segmentation of the dataset is done in order to mimic the behavior of real-time data traffic occurred in a real cloud environment. The experimental results show that the proposed approach reduces the execution time with improved accuracy, and is fault-tolerant when handling VM failures. The system is a proof-of-concept model for a scalable, cloud-based distributed system that is able to explore untapped resources, and may be used as a base model for a real-time hierarchical IDS.
This paper introduces an ensemble model that solves the binary classification problem by incorporating the basic Logistic Regression with the two recent advanced paradigms: extreme gradient boosted decision trees (xgboost) and deep learning. To obtain the best result when integrating sub-models, we introduce a solution to split and select sets of features for the sub-model training. In addition to the ensemble model, we propose a flexible robust and highly scalable new scheme for building a composite classifier that tries to simultaneously implement multiple layers of model decomposition and outputs aggregation to maximally reduce both bias and variance (spread) components of classification errors. We demonstrate the power of our ensemble model to solve the problem of predicting the outcome of Hearthstone, a turn-based computer game, based on game state information. Excellent predictive performance of our model has been acknowledged by the second place scored in the final ranking among 188 competing teams.
We present a neural network technique for the analysis and extrapolation of time-series data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourier-like decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the Mackey-Glass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a time-series of monthly international airline passengers, and an unevenly sampled time-series of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular time-series forecasting techniques including LSTM, echo state networks, (S)ARIMA, and SVR with a radial basis function.
Phishing is a major concern on the Internet today and many users are falling victim because of criminal's deceitful tactics. Blacklisting is still the most common defence users have against such phishing websites, but is failing to cope with the increasing number. In recent years, researchers have devised modern ways of detecting such websites using machine learning. One such method is to create machine learnt models of URL features to classify whether URLs are phishing. However, there are varying opinions on what the best approach is for features and algorithms. In this paper, the objective is to evaluate the performance of the Random Forest algorithm using a lexical only dataset. The performance is benchmarked against other machine learning algorithms and additionally against those reported in the literature. Initial results from experiments indicate that the Random Forest algorithm performs the best yielding an 86.9% accuracy.
In the last decade, numerous fake websites have been developed on the World Wide Web to mimic trusted websites, with the aim of stealing financial assets from users and organizations. This form of online attack is called phishing, and it has cost the online community and the various stakeholders hundreds of million Dollars. Therefore, effective counter measures that can accurately detect phishing are needed. Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing when contrasted with classic anti-phishing approaches, including awareness workshops, visualization and legal solutions. This article investigates ML techniques applicability to detect phishing attacks and describes their pros and cons. In particular, different types of ML techniques have been investigated to reveal the suitable options that can serve as anti-phishing tools. More importantly, we experimentally compare large numbers of ML techniques on real phishing datasets and with respect to different metrics. The purpose of the comparison is to reveal the advantages and disadvantages of ML predictive models and to show their actual performance when it comes to phishing attacks. The experimental results show that Covering approach models are more appropriate as anti-phishing solutions, especially for novice users, because of their simple yet effective knowledge bases in addition to their good phishing detection rate.
Encryption is often not sufficient to secure communication, since it does not hide that communication takes place or who is communicating with whom. Covert channels hide the very existence of communication enabling individuals to communicate secretly. Previous work proposed a covert channel hidden inside multi-player first person shooter online game traffic (FPSCC). FPSCC has a low bit rate, but it is practically impossible to eliminate other than by blocking the overt game trac. This paper shows that with knowledge of the channel’s encoding and using machine learning techniques, FPSCC can be detected with an accuracy of 95% or higher.