Biblio

List
Filter

Found 765 results

Filters: Keyword is Training [Clear All Filters]

2018-02-14

Gutzwiller, R. S., Reeder, J.. 2017. Human interactive machine learning for trust in teams of autonomous robots. 2017 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA). :1–3.

Unmanned systems are increasing in number, while their manning requirements remain the same. To decrease manpower demands, machine learning techniques and autonomy are gaining traction and visibility. One barrier is human perception and understanding of autonomy. Machine learning techniques can result in “black box” algorithms that may yield high fitness, but poor comprehension by operators. However, Interactive Machine Learning (IML), a method to incorporate human input over the course of algorithm development by using neuro-evolutionary machine-learning techniques, may offer a solution. IML is evaluated here for its impact on developing autonomous team behaviors in an area search task. Initial findings show that IML-generated search plans were chosen over plans generated using a non-interactive ML technique, even though the participants trusted them slightly less. Further, participants discriminated each of the two types of plans from each other with a high degree of accuracy, suggesting the IML approach imparts behavioral characteristics into algorithms, making them more recognizable. Together the results lay the foundation for exploring how to team humans successfully with ML behavior.

Nam, C., Walker, P., Lewis, M., Sycara, K.. 2017. Predicting trust in human control of swarms via inverse reinforcement learning. 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). :528–533.

In this paper, we study the model of human trust where an operator controls a robotic swarm remotely for a search mission. Existing trust models in human-in-the-loop systems are based on task performance of robots. However, we find that humans tend to make their decisions based on physical characteristics of the swarm rather than its performance since task performance of swarms is not clearly perceivable by humans. We formulate trust as a Markov decision process whose state space includes physical parameters of the swarm. We employ an inverse reinforcement learning algorithm to learn behaviors of the operator from a single demonstration. The learned behaviors are used to predict the trust level of the operator based on the features of the swarm.

Stubbs, J. J., Birch, G. C., Woo, B. L., Kouhestani, C. G.. 2017. Physical security assessment with convolutional neural network transfer learning. 2017 International Carnahan Conference on Security Technology (ICCST). :1–6.

Deep learning techniques have demonstrated the ability to perform a variety of object recognition tasks using visible imager data; however, deep learning has not been implemented as a means to autonomously detect and assess targets of interest in a physical security system. We demonstrate the use of transfer learning on a convolutional neural network (CNN) to significantly reduce training time while keeping detection accuracy of physical security relevant targets high. Unlike many detection algorithms employed by video analytics within physical security systems, this method does not rely on temporal data to construct a background scene; targets of interest can halt motion indefinitely and still be detected by the implemented CNN. A key advantage of using deep learning is the ability for a network to improve over time. Periodic retraining can lead to better detection and higher confidence rates. We investigate training data size versus CNN test accuracy using physical security video data. Due to the large number of visible imagers, significant volume of data collected daily, and currently deployed human in the loop ground truth data, physical security systems present a unique environment that is well suited for analysis via CNNs. This could lead to the creation of algorithmic element that reduces human burden and decreases human analyzed nuisance alarms.

Zhao, J., Shetty, S., Pan, J. W.. 2017. Feature-based transfer learning for network security. MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM). :17–22.

New and unseen network attacks pose a great threat to the signature-based detection systems. Consequently, machine learning-based approaches are designed to detect attacks, which rely on features extracted from network data. The problem is caused by different distribution of features in the training and testing datasets, which affects the performance of the learned models. Moreover, generating labeled datasets is very time-consuming and expensive, which undercuts the effectiveness of supervised learning approaches. In this paper, we propose using transfer learning to detect previously unseen attacks. The main idea is to learn the optimized representation to be invariant to the changes of attack behaviors from labeled training sets and non-labeled testing sets, which contain different types of attacks and feed the representation to a supervised classifier. To the best of our knowledge, this is the first effort to use a feature-based transfer learning technique to detect unseen variants of network attacks. Furthermore, this technique can be used with any common base classifier. We evaluated the technique on publicly available datasets, and the results demonstrate the effectiveness of transfer learning to detect new network attacks.

2018-02-06

Scheitle, Q., Gasser, O., Rouhi, M., Carle, G.. 2017. Large-Scale Classification of IPv6-IPv4 Siblings with Variable Clock Skew. 2017 Network Traffic Measurement and Analysis Conference (TMA). :1–9.

Linking the growing IPv6 deployment to existing IPv4 addresses is an interesting field of research, be it for network forensics, structural analysis, or reconnaissance. In this work, we focus on classifying pairs of server IPv6 and IPv4 addresses as siblings, i.e., running on the same machine. Our methodology leverages active measurements of TCP timestamps and other network characteristics, which we measure against a diverse ground truth of 682 hosts. We define and extract a set of features, including estimation of variable (opposed to constant) remote clock skew. On these features, we train a manually crafted algorithm as well as a machine-learned decision tree. By conducting several measurement runs and training in cross-validation rounds, we aim to create models that generalize well and do not overfit our training data. We find both models to exceed 99% precision in train and test performance. We validate scalability by classifying 149k siblings in a large-scale measurement of 371k sibling candidates. We argue that this methodology, thoroughly cross-validated and likely to generalize well, can aid comparative studies of IPv6 and IPv4 behavior in the Internet. Striving for applicability and replicability, we release ready-to-use source code and raw data from our study.

2018-02-02

Kim, M., Jang, I., Choo, S., Koo, J., Pack, S.. 2017. Collaborative security attack detection in software-defined vehicular networks. 2017 19th Asia-Pacific Network Operations and Management Symposium (APNOMS). :19–24.

Vehicular ad hoc networks (VANETs) are taking more attention from both the academia and the automotive industry due to a rapid development of wireless communication technologies. And with this development, vehicles called connected cars are increasingly being equipped with more sensors, processors, storages, and communication devices as they start to provide both infotainment and safety services through V2X communication. Such increase of vehicles is also related to the rise of security attacks and potential security threats. In a vehicular environment, security is one of the most important issues and it must be addressed before VANETs can be widely deployed. Conventional VANETs have some unique characteristics such as high mobility, dynamic topology, and a short connection time. Since an attacker can launch any unexpected attacks, it is difficult to predict these attacks in advance. To handle this problem, we propose collaborative security attack detection mechanism in a software-defined vehicular networks that uses multi-class support vector machine (SVM) to detect various types of attacks dynamically. We compare our security mechanism to existing distributed approach and present simulation results. The results demonstrate that the proposed security mechanism can effectively identify the types of attacks and achieve a good performance regarding high precision, recall, and accuracy.

2018-01-23

Babu, V., Kumar, R., Nguyen, H. H., Nicol, D. M., Palani, K., Reed, E.. 2017. Melody: Synthesized datasets for evaluating intrusion detection systems for the smart grid. 2017 Winter Simulation Conference (WSC). :1061–1072.

As smart grid systems become increasingly reliant on networks of control devices, attacks on their inherent security vulnerabilities could lead to catastrophic system failures. Network Intrusion Detection Systems(NIDS) detect such attacks by learning traffic patterns and finding anomalies in them. However, availability of data for robust training and evaluation of NIDS is rare due to associated operational and security risks of sharing such data. Consequently, we present Melody, a scalable framework for synthesizing such datasets. Melody models both, the cyber and physical components of the smart grid by integrating a simulated physical network with an emulated cyber network while using virtual time for high temporal fidelity. We present a systematic approach to generate traffic representing multi-stage attacks, where each stage is either emulated or recreated with a mechanism to replay arbitrary packet traces. We describe and evaluate the suitability of Melodys datasets for intrusion detection, by analyzing the extent to which temporal accuracy of pertinent features is maintained.

Dabas, N., Singh, R. P., Kher, G., Chaudhary, V.. 2017. A novel SVD and online sequential extreme learning machine based watermark method for copyright protection. 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1–5.

For the increasing use of internet, it is equally important to protect the intellectual property. And for the protection of copyright, a blind digital watermark algorithm with SVD and OSELM in the IWT domain has been proposed. During the embedding process, SVD has been applied to the coefficient blocks to get the singular values in the IWT domain. Singular values are modulated to embed the watermark in the host image. Online sequential extreme learning machine is trained to learn the relationship between the original coefficient and the corresponding watermarked version. During the extraction process, this trained OSELM is used to extract the embedded watermark logo blindly as no original host image is required during this process. The watermarked image is altered using various attacks like blurring, noise, sharpening, rotation and cropping. The experimental results show that the proposed watermarking scheme is robust against various attacks. The extracted watermark has very much similarity with the original watermark and works good to prove the ownership.

2018-01-16

Buriro, A., Akhtar, Z., Crispo, B., Gupta, S.. 2017. Mobile biometrics: Towards a comprehensive evaluation methodology. 2017 International Carnahan Conference on Security Technology (ICCST). :1–6.

Smartphones have become the pervasive personal computing platform. Recent years thus have witnessed exponential growth in research and development for secure and usable authentication schemes for smartphones. Several explicit (e.g., PIN-based) and/or implicit (e.g., biometrics-based) authentication methods have been designed and published in the literature. In fact, some of them have been embedded in commercial mobile products as well. However, the published studies report only the brighter side of the proposed scheme(s), e.g., higher accuracy attained by the proposed mechanism. While other associated operational issues, such as computational overhead, robustness to different environmental conditions/attacks, usability, are intentionally or unintentionally ignored. More specifically, most publicly available frameworks did not discuss or explore any other evaluation criterion, usability and environment-related measures except the accuracy under zero-effort. Thus, their baseline operations usually give a false sense of progress. This paper, therefore, presents some guidelines to researchers for designing, implementation, and evaluating smartphone user authentication methods for a positive impact on future technological developments.

Nikolskaya, K. Y., Ivanov, S. A., Golodov, V. A., Sinkov, A. S.. 2017. Development of a mathematical model of the control beginning of DDoS-attacks and malicious traffic. 2017 International Conference "Quality Management,Transport and Information Security, Information Technologies" (IT QM IS). :84–86.

A technique and algorithms for early detection of the started attack and subsequent blocking of malicious traffic are proposed. The primary separation of mixed traffic into trustworthy and malicious traffic was carried out using cluster analysis. Classification of newly arrived requests was done using different classifiers with the help of received training samples and developed success criteria.

2018-01-10

Bhattacharjee, S. Das, Talukder, A., Al-Shaer, E., Doshi, P.. 2017. Prioritized active learning for malicious URL detection using weighted text-based features. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :107–112.

Data analytics is being increasingly used in cyber-security problems, and found to be useful in cases where data volumes and heterogeneity make it cumbersome for manual assessment by security experts. In practical cyber-security scenarios involving data-driven analytics, obtaining data with annotations (i.e. ground-truth labels) is a challenging and known limiting factor for many supervised security analytics task. Significant portions of the large datasets typically remain unlabelled, as the task of annotation is extensively manual and requires a huge amount of expert intervention. In this paper, we propose an effective active learning approach that can efficiently address this limitation in a practical cyber-security problem of Phishing categorization, whereby we use a human-machine collaborative approach to design a semi-supervised solution. An initial classifier is learnt on a small amount of the annotated data which in an iterative manner, is then gradually updated by shortlisting only relevant samples from the large pool of unlabelled data that are most likely to influence the classifier performance fast. Prioritized Active Learning shows a significant promise to achieve faster convergence in terms of the classification performance in a batch learning framework, and thus requiring even lesser effort for human annotation. An useful feature weight update technique combined with active learning shows promising classification performance for categorizing Phishing/malicious URLs without requiring a large amount of annotated training samples to be available during training. In experiments with several collections of PhishMonger's Targeted Brand dataset, the proposed method shows significant improvement over the baseline by as much as 12%.

Meltsov, V. Y., Lesnikov, V. A., Dolzhenkova, M. L.. 2017. Intelligent system of knowledge control with the natural language user interface. 2017 International Conference "Quality Management,Transport and Information Security, Information Technologies" (IT QM IS). :671–675.

This electronic document is a “live” template and already defines the components of your paper [title, text, heads, etc.] in its style sheet. The paper considers the possibility and necessity of using in modern control and training systems with a natural language interface methods and mechanisms, characteristic for knowledge processing systems. This symbiosis assumes the introduction of specialized inference machines into the testing systems. For the effective operation of such an intelligent interpreter, it is necessary to “translate” the user's answers into one of the known forms of the knowledge representation, for example, into the expressions (rules) of the first-order predicate calculus. A lexical processor, performing morphological, syntactic and semantic analysis, solves this task. To simplify further work with the rules, the Skolem-transformation is used, which allows to get rid of quantifiers and to present semantic structures in the form of sequents (clauses, disjuncts). The basic principles of operation of the inference machine are described, which is the main component of the developed intellectual subsystem. To improve the performance of the machine, one of the fastest methods was chosen - a parallel method of deductive inference based on the division of clauses. The parallelism inherent in the method, and the use of the dataflow architecture, allow parallel computations in the output machine to be implemented without additional effort on the part of the programmer. All this makes it possible to reduce the time for comparing the sequences stored in the knowledge base by several times as compared to traditional inference mechanisms that implement various versions of the principle of resolutions. Formulas and features of the technique of numerical estimation of the user's answers are given. In general, the development of the human-computer dialogue capabilities in test systems- through the development of a specialized module for processing knowledge, will increase the intelligence of such systems and allow us to directly consider the semantics of sentences, more accurately determine the relevance of the user's response to standard knowledge and, ultimately, get rid of the skeptical attitude of many managers to machine testing systems.

Alzhrani, K., Rudd, E. M., Chow, C. E., Boult, T. E.. 2017. Automated U.S diplomatic cables security classification: Topic model pruning vs. classification based on clusters. 2017 IEEE International Symposium on Technologies for Homeland Security (HST). :1–6.

The U.S Government has been the target for cyberattacks from all over the world. Just recently, former President Obama accused the Russian government of the leaking emails to Wikileaks and declared that the U.S. might be forced to respond. While Russia denied involvement, it is clear that the U.S. has to take some defensive measures to protect its data infrastructure. Insider threats have been the cause of other sensitive information leaks too, including the infamous Edward Snowden incident. Most of the recent leaks were in the form of text. Due to the nature of text data, security classifications are assigned manually. In an adversarial environment, insiders can leak texts through E-mail, printers, or any untrusted channels. The optimal defense is to automatically detect the unstructured text security class and enforce the appropriate protection mechanism without degrading services or daily tasks. Unfortunately, existing Data Leak Prevention (DLP) systems are not well suited for detecting unstructured texts. In this paper, we compare two recent approaches in the literature for text security classification, evaluating them on actual sensitive text data from the WikiLeaks dataset.

Holdsworth, J., Apeh, E.. 2017. An Effective Immersive Cyber Security Awareness Learning Platform for Businesses in the Hospitality Sector. 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW). :111–117.

The rapid digitalisation of the hospitality industry over recent years has brought forth many new points of attack for consideration. The hasty implementation of these systems has created a reality in which businesses are using the technical solutions, but employees have very little awareness when it comes to the threats and implications that they might present. This gap in awareness is further compounded by the existence of preestablished, often rigid, cultures that drive how hospitality businesses operate. Potential attackers are recognising this and the last two years have seen a huge increase in cyber-attacks within the sector.Attempts at addressing the increasing threats have taken the form of technical solutions such as encryption, access control, CCTV, etc. However, a high majority of security breaches can be directly attributed to human error. It is therefore necessary that measures for addressing the rising trend of cyber-attacks go beyond just providing technical solutions and make provision for educating employees about how to address the human elements of security. Inculcating security awareness amongst hospitality employees will provide a foundation upon which a culture of security can be created to promote the seamless and secured interaction of hotel users and technology.One way that the hospitality industry has tried to solve the awareness issue is through their current paper-based training. This is unengaging, expensive and presents limited ways to deploy, monitor and evaluate the impact and effectiveness of the content. This leads to cycles of constant training, making it very hard to initiate awareness, particularly within those on minimum waged, short-term job roles.This paper presents a structured approach for eliciting industry requirement for developing and implementing an immersive Cyber Security Awareness learning platform. It used a series of over 40 interviews and threat analysis of the hospitality industry to identify the requirements fo- designing and implementing cyber security program which encourage engagement through a cycle of reward and recognition. In particular, the need for the use of gamification elements to provide an engaging but gentle way of educating those with little or no desire to learn was identified and implemented. Also presented is a method for guiding and monitoring the impact of their employee's progress through the learning management system whilst monitoring the levels of engagement and positive impact the training is having on the business.

Aono, K., Chakrabartty, S., Yamasaki, T.. 2017. Infrasonic scene fingerprinting for authenticating speaker location. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :361–365.

Ambient infrasound with frequency ranges well below 20 Hz is known to carry robust navigation cues that can be exploited to authenticate the location of a speaker. Unfortunately, many of the mobile devices like smartphones have been optimized to work in the human auditory range, thereby suppressing information in the infrasonic region. In this paper, we show that these ultra-low frequency cues can still be extracted from a standard smartphone recording by using acceleration-based cepstral features. To validate our claim, we have collected smartphone recordings from more than 30 different scenes and used the cues for scene fingerprinting. We report scene recognition rates in excess of 90% and a feature set analysis reveals the importance of the infrasonic signatures towards achieving the state-of-the-art recognition performance.

2017-12-28

Poon, W. N., Bennin, K. E., Huang, J., Phannachitta, P., Keung, J. W.. 2017. Cross-Project Defect Prediction Using a Credibility Theory Based Naive Bayes Classifier. 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). :434–441.

Several defect prediction models proposed are effective when historical datasets are available. Defect prediction becomes difficult when no historical data exist. Cross-project defect prediction (CPDP), which uses projects from other sources/companies to predict the defects in the target projects proposed in recent studies has shown promising results. However, the performance of most CPDP approaches are still beyond satisfactory mainly due to distribution mismatch between the source and target projects. In this study, a credibility theory based Naïve Bayes (CNB) classifier is proposed to establish a novel reweighting mechanism between the source projects and target projects so that the source data could simultaneously adapt to the target data distribution and retain its own pattern. Our experimental results show that the feasibility of the novel algorithm design and demonstrate the significant improvement in terms of the performance metrics considered achieved by CNB over other CPDP approaches.

Gangadhar, S., Sterbenz, J. P. G.. 2017. Machine learning aided traffic tolerance to improve resilience for software defined networks. 2017 9th International Workshop on Resilient Networks Design and Modeling (RNDM). :1–7.

Software Defined Networks (SDNs) have gained prominence recently due to their flexible management and superior configuration functionality of the underlying network. SDNs, with OpenFlow as their primary implementation, allow for the use of a centralised controller to drive the decision making for all the supported devices in the network and manage traffic through routing table changes for incoming flows. In conventional networks, machine learning has been shown to detect malicious intrusion, and classify attacks such as DoS, user to root, and probe attacks. In this work, we extend the use of machine learning to improve traffic tolerance for SDNs. To achieve this, we extend the functionality of the controller to include a resilience framework, ReSDN, that incorporates machine learning to be able to distinguish DoS attacks, focussing on a neptune attack for our experiments. Our model is trained using the MIT KDD 1999 dataset. The system is developed as a module on top of the POX controller platform and evaluated using the Mininet simulator.

Kumar, S. A. P., Bhargava, B., Macêdo, R., Mani, G.. 2017. Securing IoT-Based Cyber-Physical Human Systems against Collaborative Attacks. 2017 IEEE International Congress on Internet of Things (ICIOT). :9–16.

Security issues in the IoT based CPS are exacerbated with human participation in CPHS due to the vulnerabilities in both the technologies and the human involvement. A holistic framework to mitigate security threats in the IoT-based CPHS environment is presented to mitigate these issues. We have developed threat model involving human elements in the CPHS environment. Research questions, directions, and ideas with respect to securing IoT based CPHS against collaborative attacks are presented.

Mehetrey, P., Shahriari, B., Moh, M.. 2016. Collaborative Ensemble-Learning Based Intrusion Detection Systems for Clouds. 2016 International Conference on Collaboration Technologies and Systems (CTS). :404–411.

Cloud computation has become prominent with seemingly unlimited amount of storage and computation available to users. Yet, security is a major issue that hampers the growth of cloud. In this research we investigate a collaborative Intrusion Detection System (IDS) based on the ensemble learning method. It uses weak classifiers, and allows the use of untapped resources of cloud to detect various types of attacks on the cloud system. In the proposed system, tasks are distributed among available virtual machines (VM), individual results are then merged for the final adaptation of the learning model. Performance evaluation is carried out using decision trees and using fuzzy classifiers, on KDD99, one of the largest datasets for IDS. Segmentation of the dataset is done in order to mimic the behavior of real-time data traffic occurred in a real cloud environment. The experimental results show that the proposed approach reduces the execution time with improved accuracy, and is fault-tolerant when handling VM failures. The system is a proof-of-concept model for a scalable, cloud-based distributed system that is able to explore untapped resources, and may be used as a base model for a real-time hierarchical IDS.

Vu, Q. H., Ruta, D., Cen, L.. 2017. An ensemble model with hierarchical decomposition and aggregation for highly scalable and robust classification. 2017 Federated Conference on Computer Science and Information Systems (FedCSIS). :149–152.

This paper introduces an ensemble model that solves the binary classification problem by incorporating the basic Logistic Regression with the two recent advanced paradigms: extreme gradient boosted decision trees (xgboost) and deep learning. To obtain the best result when integrating sub-models, we introduce a solution to split and select sets of features for the sub-model training. In addition to the ensemble model, we propose a flexible robust and highly scalable new scheme for building a composite classifier that tries to simultaneously implement multiple layers of model decomposition and outputs aggregation to maximally reduce both bias and variance (spread) components of classification errors. We demonstrate the power of our ensemble model to solve the problem of predicting the outcome of Hearthstone, a turn-based computer game, based on game state information. Excellent predictive performance of our model has been acknowledged by the second place scored in the final ranking among 188 competing teams.

Godfrey, L. B., Gashler, M. S.. 2017. Neural decomposition of time-series data. 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). :2796–2801.

We present a neural network technique for the analysis and extrapolation of time-series data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourier-like decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the Mackey-Glass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a time-series of monthly international airline passengers, and an unevenly sampled time-series of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular time-series forecasting techniques including LSTM, echo state networks, (S)ARIMA, and SVR with a radial basis function.

2017-12-20

Weedon, M., Tsaptsinos, D., Denholm-Price, J.. 2017. Random forest explorations for URL classification. 2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). :1–4.

Phishing is a major concern on the Internet today and many users are falling victim because of criminal's deceitful tactics. Blacklisting is still the most common defence users have against such phishing websites, but is failing to cope with the increasing number. In recent years, researchers have devised modern ways of detecting such websites using machine learning. One such method is to create machine learnt models of URL features to classify whether URLs are phishing. However, there are varying opinions on what the best approach is for features and algorithms. In this paper, the objective is to evaluate the performance of the Random Forest algorithm using a lexical only dataset. The performance is benchmarked against other machine learning algorithms and additionally against those reported in the literature. Initial results from experiments indicate that the Random Forest algorithm performs the best yielding an 86.9% accuracy.

Abdelhamid, N., Thabtah, F., Abdel-jaber, H.. 2017. Phishing detection: A recent intelligent machine learning comparison based on models content and features. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :72–77.

In the last decade, numerous fake websites have been developed on the World Wide Web to mimic trusted websites, with the aim of stealing financial assets from users and organizations. This form of online attack is called phishing, and it has cost the online community and the various stakeholders hundreds of million Dollars. Therefore, effective counter measures that can accurately detect phishing are needed. Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing when contrasted with classic anti-phishing approaches, including awareness workshops, visualization and legal solutions. This article investigates ML techniques applicability to detect phishing attacks and describes their pros and cons. In particular, different types of ML techniques have been investigated to reveal the suitable options that can serve as anti-phishing tools. More importantly, we experimentally compare large numbers of ML techniques on real phishing datasets and with respect to different metrics. The purpose of the comparison is to reveal the advantages and disadvantages of ML predictive models and to show their actual performance when it comes to phishing attacks. The experimental results show that Covering approach models are more appropriate as anti-phishing solutions, especially for novice users, because of their simple yet effective knowledge bases in addition to their good phishing detection rate.

Le, T. A., Baydin, A. G., Zinkov, R., Wood, F.. 2017. Using synthetic data to train neural networks is model-based reasoning. 2017 International Joint Conference on Neural Networks (IJCNN). :3514–3521.

We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.

2017-12-12

Zander, S.. 2017. Detecting Covert Channels in FPS Online Games. 2017 IEEE 42nd Conference on Local Computer Networks (LCN). :555–558.

Encryption is often not sufficient to secure communication, since it does not hide that communication takes place or who is communicating with whom. Covert channels hide the very existence of communication enabling individuals to communicate secretly. Previous work proposed a covert channel hidden inside multi-player first person shooter online game traffic (FPSCC). FPSCC has a low bit rate, but it is practically impossible to eliminate other than by blocking the overt game trac. This paper shows that with knowledge of the channel’s encoding and using machine learning techniques, FPSCC can be detected with an accuracy of 95% or higher.