Biblio
In cloud computing environments with many virtual machines, containers, and other systems, an epidemic of malware can be crippling and highly threatening to business processes. In this vision paper, we introduce a hierarchical approach to performing malware detection and analysis using several recent advances in machine learning on graphs, hypergraphs, and natural language. We analyze individual systems and their logs, inspecting and understanding their behavior with attentional sequence models. Given a feature representation of each system's logs using this procedure, we construct an attributed network of the cloud with systems and other components as vertices and propose an analysis of malware with inductive graph and hypergraph learning models. With this foundation, we consider the multicloud case, in which multiple clouds with differing privacy requirements cooperate against the spread of malware, proposing the use of federated learning to perform inference and training while preserving privacy. Finally, we discuss several open problems that remain in defending cloud computing environments against malware related to designing robust ecosystems, identifying cloud-specific optimization problems for response strategy, action spaces for malware containment and eradication, and developing priors and transfer learning tasks for machine learning models in this area.
The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher.
Cyber-Physical Systems (CPS) are growing with added complexity and functionality. Multidisciplinary interactions with physical systems are the major keys to CPS. However, sensors, actuators, controllers, and wireless communications are prone to attacks that compromise the system. Machine learning models have been utilized in controllers of automotive to learn, estimate, and provide the required intelligence in the control process. However, their estimation is also vulnerable to the attacks from physical or cyber domains. They have shown unreliable predictions against unknown biases resulted from the modeling. In this paper, we propose a novel control design using conditional generative adversarial networks that will enable a self-secured controller to capture the normal behavior of the control loop and the physical system, detect the anomaly, and recover from them. We experimented our novel control design on a self-secured BMS by driving a Nissan Leaf S on standard driving cycles while under various attacks. The performance of the design has been compared to the state-of-the-art; the self-secured BMS could detect the attacks with 83% accuracy and the recovery estimation error of 21% on average, which have improved by 28% and 8%, respectively.
Traditional anti-virus technologies have failed to keep pace with proliferation of malware due to slow process of their signatures and heuristics updates. Similarly, there are limitations of time and resources in order to perform manual analysis on each malware. There is a need to learn from this vast quantity of data, containing cyber attack pattern, in an automated manner to proactively adapt to ever-evolving threats. Machine learning offers unique advantages to learn from past cyber attacks to handle future cyber threats. The purpose of this research is to propose a framework for multi-classification of malware into well-known categories by applying different machine learning models over corpus of malware analysis reports. These reports are generated through an open source malware sandbox in an automated manner. We applied extensive pre-modeling techniques for data cleaning, features exploration and features engineering to prepare training and test datasets. Best possible hyper-parameters are selected to build machine learning models. These prepared datasets are then used to train the machine learning classifiers and to compare their prediction accuracy. Finally, these results are validated through a comprehensive 10-fold cross-validation methodology. The best results are achieved through Gaussian Naive Bayes classifier with random accuracy of 96% and 10-Fold Cross Validation accuracy of 91.2%. The said framework can be deployed in an operational environment to learn from malware attacks for proactively adapting matching counter measures.
This paper proposes a method to detect two primary means of using the Domain Name System (DNS) for malicious purposes. We develop machine learning models to detect information exfiltration from compromised machines and the establishment of command & control (C&C) servers via tunneling. We validate our approach by experiments where we successfully detect a malware used in several recent Advanced Persistent Threat (APT) attacks [1]. The novelty of our method is its robustness, simplicity, scalability, and ease of deployment in a production environment.