Biblio
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: at the web application host, and at a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance.
In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). Poisoning attack, which is one of the most recognized security threats towards machine learning- based IDSs, injects some adversarial samples into the training phase, inducing data drifting of training data and a significant performance decrease of target IDSs over testing data. In this paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel poisoning method that attack against several machine learning algorithms used in IDSs. Specifically, we propose a boundary pattern detection algorithm to efficiently generate the points that are near to abnormal data but considered to be normal ones by current classifiers. Then, we introduce a Batch-EPD Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the number of edge pattern points generated by EPD and to obtain more useful adversarial samples. Based on BEBP, we further present a moderate but effective poisoning method called chronic poisoning attack. Extensive experiments on synthetic and three real network data sets demonstrate the performance of the proposed poisoning method against several well-known machine learning algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.
Two-factor authentication (2FA) popularly works by verifying something the user knows (a password) and something she possesses (a token, popularly instantiated with a smart phone). Conventional 2FA systems require extra interaction like typing a verification code, which is not very user-friendly. For improved user experience, recent work aims at zero-effort 2FA, in which a smart phone placed close to a computer (where the user enters her username/password into a browser to log into a server) automatically assists with the authentication. To prove her possession of the smart phone, the user needs to prove the phone is on the login spot, which reduces zero-effort 2FA to co-presence detection. In this paper, we propose SoundAuth, a secure zero-effort 2FA mechanism based on (two kinds of) ambient audio signals. SoundAuth looks for signs of proximity by having the browser and the smart phone compare both their surrounding sounds and certain unpredictable near-ultrasounds; if significant distinguishability is found, SoundAuth rejects the login request. For the ambient signals comparison, we regard it as a classification problem and employ a machine learning technique to analyze the audio signals. Experiments with real login attempts show that SoundAuth not only is comparable to existent schemes concerning utility, but also outperforms them in terms of resilience to attacks. SoundAuth can be easily deployed as it is readily supported by most smart phones and major browsers.
We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present three classes of attacks on the VAE and VAE-GAN architectures and demonstrate them against networks trained on MNIST, SVHN and CelebA. Our first attack leverages classification-based adversaries by attaching a classifier to the trained encoder of the target generative model, which can then be used to indirectly manipulate the latent representation. Our second attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our third attack moves beyond relying on classification or the standard loss for the gradient and directly optimizes against differences in source and target latent representations. We also motivate why an attacker might be interested in deploying such techniques against a target generative network.
This paper investigates the use of deep reinforcement learning (DRL) in the design of a "universal" MAC protocol referred to as Deep-reinforcement Learning Multiple Access (DLMA). The design framework is partially inspired by the vision of DARPA SC2, a 3-year competition whereby competitors are to come up with a clean-slate design that "best share spectrum with any network(s), in any environment, without prior knowledge, leveraging on machine-learning technique". While the scope of DARPA SC2 is broad and involves the redesign of PHY, MAC, and Network layers, this paper's focus is narrower and only involves the MAC design. In particular, we consider the problem of sharing time slots among a multiple of time-slotted networks that adopt different MAC protocols. One of the MAC protocols is DLMA. The other two are TDMA and ALOHA. The DRL agents of DLMA do not know that the other two MAC protocols are TDMA and ALOHA. Yet, by a series of observations of the environment, its own actions, and the rewards - in accordance with the DRL algorithmic framework - a DRL agent can learn the optimal MAC strategy for harmonious co-existence with TDMA and ALOHA nodes. In particular, the use of neural networks in DRL (as opposed to traditional reinforcement learning) allows for fast convergence to optimal solutions and robustness against perturbation in hyper- parameter settings, two essential properties for practical deployment of DLMA in real wireless networks.
The increasing amount of malware variants seen in the wild is causing problems for Antivirus Software vendors, unable to keep up by creating signatures for each. The methods used to develop a signature, static and dynamic analysis, have various limitations. Machine learning has been used by Antivirus vendors to detect malware based on the information gathered from the analysis process. However, adversarial examples can cause machine learning algorithms to miss-classify new data. In this paper we describe a method for malware analysis by converting malware binaries to images and then preparing those images for training within a Generative Adversarial Network. These unsupervised deep neural networks are not susceptible to adversarial examples. The conversion to images from malware binaries should be faster than using dynamic analysis and it would still be possible to link malware families together. Using the Generative Adversarial Network, malware detection could be much more effective and reliable.
Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution "in the wild," examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.
Information security has become a growing concern. Computer covert channel which is regarded as an important area of information security research gets more attention. In order to detect these covert channels, a variety of detection algorithms are proposed in the course of the research. The algorithms of machine learning type show better results in these detection algorithms. However, the common machine learning algorithms have many problems in the testing process and have great limitations. Based on the deep learning algorithm, this paper proposes a new idea of network covert channel detection and forms a new detection model. On the one hand, this algorithmic model can detect more complex covert channels and, on the other hand, greatly improve the accuracy of detection due to the use of a new deep learning model. By optimizing this test model, we can get better results on the evaluation index.
Reconnaissance phase is where attackers identify their targets and how to collect information from professional social networks which can be used to select and exploit targeted employees to penetrate in an organization. Here, a framework is proposed for the early detection of attackers in the reconnaissance phase, highlighting the common characteristic behavior among attackers in professional social networks. And to create artificial honeypot profiles within the organizational social network which can be used to detect a potential incoming threat. By analyzing the dataset of social Network profiles in combination of machine learning techniques, A DspamRPfast model is proposed for the creation of a classifier system to predict the probabilities of the profiles being fake or malicious and to filter them out using XGBoost and for the faster classification and greater accuracy of 84.8%.
Error correction in quantum cryptography based on artificial neural networks is a new and promising solution. In this paper the security verification of this method is discussed and results of many simulations with different parameters are presented. The test scenarios assumed partially synchronized neural networks, typical for error rates in quantum cryptography. The results were also compared with scenarios based on the neural networks with random chosen weights to show the difficulty of passive attacks.
Machine learning, specifically deep learning is becoming a key technology component in application domains such as identity management, finance, automotive, and healthcare, to name a few. Proprietary machine learning models - Machine Learning IP - are developed and deployed at the network edge, end devices and in the cloud, to maximize user experience. With the proliferation of applications embedding Machine Learning IPs, machine learning models and hyper-parameters become attractive to attackers, and require protection. Major players in the semiconductor industry provide mechanisms on device to protect the IP at rest and during execution from being copied, altered, reverse engineered, and abused by attackers. In this work we explore system security architecture mechanisms and their applications to Machine Learning IP protection.
E-mail communication is one of today's indispensable communication ways. The widespread use of email has brought about some problems. The most important one of these problems are spam (unwanted) e-mails, often composed of advertisements or offensive content, sent without the recipient's request. In this study, it is aimed to analyze the content information of e-mails written in Turkish with the help of Naive Bayes Classifier and Vector Space Model from machine learning methods, to determine whether these e-mails are spam e-mails and classify them. Both methods are subjected to different evaluation criteria and their performances are compared.
Cybersecurity plays a critical role in protecting sensitive information and the structural integrity of networked systems. As networked systems continue to expand in numbers as well as in complexity, so does the threat of malicious activity and the necessity for advanced cybersecurity solutions. Furthermore, both the quantity and quality of available data on malicious content as well as the fact that malicious activity continuously evolves makes automated protection systems for this type of environment particularly challenging. Not only is the data quality a concern, but the volume of the data can be quite small for some of the classes. This creates a class imbalance in the data used to train a classifier; however, many classifiers are not well equipped to deal with class imbalance. One such example is detecting malicious HMTL files from static features. Unfortunately, collecting malicious HMTL files is extremely difficult and can be quite noisy from HTML files being mislabeled. This paper evaluates a specific application that is afflicted by these modern cybersecurity challenges: detection of malicious HTML files. Previous work presented a general framework for malicious HTML file classification that we modify in this work to use a $\chi$2 feature selection technique and synthetic minority oversampling technique (SMOTE). We experiment with different classifiers (i.e., AdaBoost, Gentle-Boost, RobustBoost, RusBoost, and Random Forest) and a pure detection model (i.e., Isolation Forest). We benchmark the different classifiers using SMOTE on a real dataset that contains a limited number of malicious files (40) with respect to the normal files (7,263). It was found that the modified framework performed better than the previous framework's results. However, additional evidence was found to imply that algorithms which train on both the normal and malicious samples are likely overtraining to the malicious distribution. We demonstrate the likely overtraining by determining that a subset of the malicious files, while suspicious, did not come from a malicious source.
Everyday., the DoS/DDoS attacks are increasing all over the world and the ways attackers are using changing continuously. This increase and variety on the attacks are affecting the governments, institutions, organizations and corporations in a bad way. Every successful attack is causing them to lose money and lose reputation in return. This paper presents an introduction to a method which can show what the attack and where the attack based on. This is tried to be achieved with using clustering algorithm DBSCAN on network traffic because of the change and variety in attack vectors.
Adversaries are conducting attack campaigns with increasing levels of sophistication. Additionally, with the prevalence of out-of-the-box toolkits that simplify attack operations during different stages of an attack campaign, multiple new adversaries and attack groups have appeared over the past decade. Characterizing the behavior and the modus operandi of different adversaries is critical in identifying the appropriate security maneuver to detect and mitigate the impact of an ongoing attack. To this end, in this paper, we study two characteristics of an adversary: Risk-averseness and Experience level. Risk-averse adversaries are more cautious during their campaign while fledgling adversaries do not wait to develop adequate expertise and knowledge before launching attack campaigns. One manifestation of these characteristics is through the adversary's choice and usage of attack tools. To detect these characteristics, we present multi-level machine learning (ML) models that use network data generated while under attack by different attack tools and usage patterns. In particular, for risk-averseness, we considered different configurations for scanning tools and trained the models in a testbed environment. The resulting model was used to predict the cautiousness of different red teams that participated in the Cyber Shield ‘16 exercise. The predictions matched the expected behavior of the red teams. For Experience level, we considered publicly-available remote access tools and usage patterns. We developed a Markov model to simulate usage patterns of attackers with different levels of expertise and through experiments on CyberVAN, we showed that the ML model has a high accuracy.
Efficient application of Internet of Battlefield Things (IoBT) technology on the battlefield calls for innovative solutions to control and manage the deluge of heterogeneous IoBT devices. This paper presents an innovative paradigm to address heterogeneity in controlling IoBT and IoT devices, enabling multi-force cooperation in challenging battlefield scenarios.
In recent years, deep convolution neural networks (DCNNs) have won many contests in machine learning, object detection, and pattern recognition. Furthermore, deep learning techniques achieved exceptional performance in image classification, reaching accuracy levels beyond human capability. Malware variants from similar categories often contain similarities due to code reuse. Converting malware samples into images can cause these patterns to manifest as image features, which can be exploited for DCNN classification. Techniques for converting malware binaries into images for visualization and classification have been reported in the literature, and while these methods do reach a high level of classification accuracy on training datasets, they tend to be vulnerable to overfitting and perform poorly on previously unseen samples. In this paper, we explore and document a variety of techniques for representing malware binaries as images with the goal of discovering a format best suited for deep learning. We implement a database for malware binaries from several families, stored in hexadecimal format. These malware samples are converted into images using various approaches and are used to train a neural network to recognize visual patterns in the input and classify malware based on the feature vectors. Each image type is assessed using a variety of learning models, such as transfer learning with existing DCNN architectures and feature extraction for support vector machine classifier training. Each technique is evaluated in terms of classification accuracy, result consistency, and time per trial. Our preliminary results indicate that improved image representation has the potential to enable more effective classification of new malware.
Security has always been a major issue in cloud. Data sources are the most valuable and vulnerable information which is aimed by attackers to steal. If data is lost, then the privacy and security of every cloud user are compromised. Even though a cloud network is secured externally, the threat of an internal attacker exists. Internal attackers compromise a vulnerable user node and get access to a system. They are connected to the cloud network internally and launch attacks pretending to be trusted users. Machine learning approaches are widely used for cloud security issues. The existing machine learning based security approaches classify a node as a misbehaving node based on short-term behavioral data. These systems do not differentiate whether a misbehaving node is a malicious node or a broken node. To address this problem, this paper proposes an Improvised Long Short-Term Memory (ILSTM) model which learns the behavior of a user and automatically trains itself and stores the behavioral data. The model can easily classify the user behavior as normal or abnormal. The proposed ILSTM not only identifies an anomaly node but also finds whether a misbehaving node is a broken node or a new user node or a compromised node using the calculated trust factor. The proposed model not only detects the attack accurately but also reduces the false alarm in the cloud network.
Due to the recent technological development, home appliances and electric devices are equipped with high-performance hardware device. Since demand of hardware devices is increased, production base become internationalized to mass-produce hardware devices with low cost and hardware vendors outsource their products to third-party vendors. Accordingly, malicious third-party vendors can easily insert malfunctions (also known as "hardware Trojans'') into their products. In this paper, we design six kinds of hardware Trojans at a gate-level netlist, and apply a neural-network (NN) based hardware-Trojan detection method to them. The designed hardware Trojans are different in trigger circuits. In addition, we insert them to normal circuits, and detect hardware Trojans using a machine-learning-based hardware-Trojan detection method with neural networks. In our experiment, we learned Trojan-infected benchmarks using NN, and performed cross validation to evaluate the learned NN. The experimental results demonstrate that the average TPR (True Positive Rate) becomes 72.9%, the average TNR (True Negative Rate) becomes 90.0%.
With the increase in the popularity of computerized online applications, the analysis, and detection of a growing number of newly discovered stealthy malware poses a significant challenge to the security community. Signature-based and behavior-based detection techniques are becoming inefficient in detecting new unknown malware. Machine learning solutions are employed to counter such intelligent malware and allow performing more comprehensive malware detection. This capability leads to an automatic analysis of malware behavior. The proposed oblique random forest ensemble learning technique is efficient for malware classification. The effectiveness of the proposed method is demonstrated with three malware classification datasets from various sources. The results are compared with other variants of decision tree learning models. The proposed system performs better than the existing system in terms of classification accuracy and false positive rate.