Visible to the public Biblio

Filters: Keyword is machine learning models  [Clear All Filters]
2021-07-27
Xu, Jiahui, Wang, Chen, Li, Tingting, Xiang, Fengtao.  2020.  Improved Adversarial Attack against Black-box Machine Learning Models. 2020 Chinese Automation Congress (CAC). :5907–5912.
The existence of adversarial samples makes the security of machine learning models in practical application questioned, especially the black-box adversarial attack, which is very close to the actual application scenario. Efficient search for black-box attack samples is helpful to train more robust models. We discuss the situation that the attacker can get nothing except the final predict label. As for this problem, the current state-of-the-art method is Boundary Attack(BA) and its variants, such as Biased Boundary Attack(BBA), however it still requires large number of queries and kills a lot of time. In this paper, we propose a novel method to solve these shortcomings. First, we improved the algorithm for generating initial adversarial samples with smaller L2 distance. Second, we innovatively combine a swarm intelligence algorithm - Particle Swarm Optimization(PSO) with Biased Boundary Attack and propose PSO-BBA method. Finally, we experiment on ImageNet dataset, and compared our algorithm with the baseline algorithm. The results show that:(1)our improved initial point selection algorithm effectively reduces the number of queries;(2)compared with the most advanced methods, our PSO-BBA method improves the convergence speed while ensuring the attack accuracy;(3)our method has a good effect on both targeted attack and untargeted attack.
2020-12-11
Payne, J., Kundu, A..  2019.  Towards Deep Federated Defenses Against Malware in Cloud Ecosystems. 2019 First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). :92—100.

In cloud computing environments with many virtual machines, containers, and other systems, an epidemic of malware can be crippling and highly threatening to business processes. In this vision paper, we introduce a hierarchical approach to performing malware detection and analysis using several recent advances in machine learning on graphs, hypergraphs, and natural language. We analyze individual systems and their logs, inspecting and understanding their behavior with attentional sequence models. Given a feature representation of each system's logs using this procedure, we construct an attributed network of the cloud with systems and other components as vertices and propose an analysis of malware with inductive graph and hypergraph learning models. With this foundation, we consider the multicloud case, in which multiple clouds with differing privacy requirements cooperate against the spread of malware, proposing the use of federated learning to perform inference and training while preserving privacy. Finally, we discuss several open problems that remain in defending cloud computing environments against malware related to designing robust ecosystems, identifying cloud-specific optimization problems for response strategy, action spaces for malware containment and eradication, and developing priors and transfer learning tasks for machine learning models in this area.

2020-11-20
Roy, D. D., Shin, D..  2019.  Network Intrusion Detection in Smart Grids for Imbalanced Attack Types Using Machine Learning Models. 2019 International Conference on Information and Communication Technology Convergence (ICTC). :576—581.
Smart grid has evolved as the next generation power grid paradigm which enables the transfer of real time information between the utility company and the consumer via smart meter and advanced metering infrastructure (AMI). These information facilitate many services for both, such as automatic meter reading, demand side management, and time-of-use (TOU) pricing. However, there have been growing security and privacy concerns over smart grid systems, which are built with both smart and legacy information and operational technologies. Intrusion detection is a critical security service for smart grid systems, alerting the system operator for the presence of ongoing attacks. Hence, there has been lots of research conducted on intrusion detection in the past, especially anomaly-based intrusion detection. Problems emerge when common approaches of pattern recognition are used for imbalanced data which represent much more data instances belonging to normal behaviors than to attack ones, and these approaches cause low detection rates for minority classes. In this paper, we study various machine learning models to overcome this drawback by using CIC-IDS2018 dataset [1].
2020-10-26
Sethi, Kamalakanta, Kumar, Rahul, Sethi, Lingaraj, Bera, Padmalochan, Patra, Prashanta Kumar.  2019.  A Novel Machine Learning Based Malware Detection and Classification Framework. 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security). :1–4.
As time progresses, new and complex malware types are being generated which causes a serious threat to computer systems. Due to this drastic increase in the number of malware samples, the signature-based malware detection techniques cannot provide accurate results. Different studies have demonstrated the proficiency of machine learning for the detection and classification of malware files. Further, the accuracy of these machine learning models can be improved by using feature selection algorithms to select the most essential features and reducing the size of the dataset which leads to lesser computations. In this paper, we have developed a machine learning based malware analysis framework for efficient and accurate malware detection and classification. We used Cuckoo sandbox for dynamic analysis which executes malware in an isolated environment and generates an analysis report based on the system activities during execution. Further, we propose a feature extraction and selection module which extracts features from the report and selects the most important features for ensuring high accuracy at minimum computation cost. Then, we employ different machine learning algorithms for accurate detection and fine-grained classification. Experimental results show that we got high detection and classification accuracy in comparison to the state-of-the-art approaches.
Walker, Aaron, Sengupta, Shamik.  2019.  Insights into Malware Detection via Behavioral Frequency Analysis Using Machine Learning. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :1–6.
The most common defenses against malware threats involves the use of signatures derived from instances of known malware. However, the constant evolution of the malware threat landscape necessitates defense against unknown malware, making a signature catalog of known threats insufficient to prevent zero-day vulnerabilities from being exploited. Recent research has applied machine learning approaches to identify malware through artifacts of malicious activity as observed through dynamic behavioral analysis. We have seen that these approaches mimic common malware defenses by simply offering a method of detecting known malware. We contribute a new method of identifying software as malicious or benign through analysis of the frequency of Windows API system function calls. We show that this is a powerful technique for malware detection because it generates learning models which understand the difference between malicious and benign software, rather than producing a malware signature classifier. We contribute a method of systematically comparing machine learning models against different datasets to determine their efficacy in accurately distinguishing the difference between malicious and benign software.
2020-04-20
Lecuyer, Mathias, Atlidakis, Vaggelis, Geambasu, Roxana, Hsu, Daniel, Jana, Suman.  2019.  Certified Robustness to Adversarial Examples with Differential Privacy. 2019 IEEE Symposium on Security and Privacy (SP). :656–672.
Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism, that provides a rigorous, generic, and flexible foundation for defense.
Esquivel-Quiros, Luis Gustavo, Barrantes, Elena Gabriela, Darlington, Fernando Esponda.  2018.  Measuring data privacy preserving and machine learning. 2018 7th International Conference On Software Process Improvement (CIMPS). :85–94.

The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher.

2020-03-30
Scherzinger, Stefanie, Seifert, Christin, Wiese, Lena.  2019.  The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning. 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). :1620–1629.
Machine learning experts prefer to think of their input as a single, homogeneous, and consistent data set. However, when analyzing large volumes of data, the entire data set may not be manageable on a single server, but must be stored on a distributed file system instead. Moreover, with the pressing demand to deliver explainable models, the experts may no longer focus on the machine learning algorithms in isolation, but must take into account the distributed nature of the data stored, as well as the impact of any data pre-processing steps upstream in their data analysis pipeline. In this paper, we make the point that even basic transformations during data preparation can impact the model learned, and that this is exacerbated in a distributed setting. We then sketch our vision of end-to-end explainability of the model learned, taking the pre-processing into account. In particular, we point out the potentials of linking the contributions of research on data provenance with the efforts on explainability in machine learning. In doing so, we highlight pitfalls we may experience in a distributed system on the way to generating more holistic explanations for our machine learning models.
Jentzsch, Sophie F., Hochgeschwender, Nico.  2019.  Don't Forget Your Roots! Using Provenance Data for Transparent and Explainable Development of Machine Learning Models. 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). :37–40.
Explaining reasoning and behaviour of artificial intelligent systems to human users becomes increasingly urgent, especially in the field of machine learning. Many recent contributions approach this issue with post-hoc methods, meaning they consider the final system and its outcomes, while the roots of included artefacts are widely neglected. However, we argue in this position paper that there needs to be a stronger focus on the development process. Without insights into specific design decisions and meta information that accrue during the development an accurate explanation of the resulting model is hardly possible. To remedy this situation we propose to increase process transparency by applying provenance methods, which serves also as a basis for increased explainability.
2020-03-02
Vatanparvar, Korosh, Al Faruque, Mohammad Abdullah.  2019.  Self-Secured Control with Anomaly Detection and Recovery in Automotive Cyber-Physical Systems. 2019 Design, Automation Test in Europe Conference Exhibition (DATE). :788–793.

Cyber-Physical Systems (CPS) are growing with added complexity and functionality. Multidisciplinary interactions with physical systems are the major keys to CPS. However, sensors, actuators, controllers, and wireless communications are prone to attacks that compromise the system. Machine learning models have been utilized in controllers of automotive to learn, estimate, and provide the required intelligence in the control process. However, their estimation is also vulnerable to the attacks from physical or cyber domains. They have shown unreliable predictions against unknown biases resulted from the modeling. In this paper, we propose a novel control design using conditional generative adversarial networks that will enable a self-secured controller to capture the normal behavior of the control loop and the physical system, detect the anomaly, and recover from them. We experimented our novel control design on a self-secured BMS by driving a Nissan Leaf S on standard driving cycles while under various attacks. The performance of the design has been compared to the state-of-the-art; the self-secured BMS could detect the attacks with 83% accuracy and the recovery estimation error of 21% on average, which have improved by 28% and 8%, respectively.

2019-07-01
Amjad, N., Afzal, H., Amjad, M. F., Khan, F. A..  2018.  A Multi-Classifier Framework for Open Source Malware Forensics. 2018 IEEE 27th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). :106-111.

Traditional anti-virus technologies have failed to keep pace with proliferation of malware due to slow process of their signatures and heuristics updates. Similarly, there are limitations of time and resources in order to perform manual analysis on each malware. There is a need to learn from this vast quantity of data, containing cyber attack pattern, in an automated manner to proactively adapt to ever-evolving threats. Machine learning offers unique advantages to learn from past cyber attacks to handle future cyber threats. The purpose of this research is to propose a framework for multi-classification of malware into well-known categories by applying different machine learning models over corpus of malware analysis reports. These reports are generated through an open source malware sandbox in an automated manner. We applied extensive pre-modeling techniques for data cleaning, features exploration and features engineering to prepare training and test datasets. Best possible hyper-parameters are selected to build machine learning models. These prepared datasets are then used to train the machine learning classifiers and to compare their prediction accuracy. Finally, these results are validated through a comprehensive 10-fold cross-validation methodology. The best results are achieved through Gaussian Naive Bayes classifier with random accuracy of 96% and 10-Fold Cross Validation accuracy of 91.2%. The said framework can be deployed in an operational environment to learn from malware attacks for proactively adapting matching counter measures.

2018-03-19
Das, A., Shen, M. Y., Shashanka, M., Wang, J..  2017.  Detection of Exfiltration and Tunneling over DNS. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). :737–742.

This paper proposes a method to detect two primary means of using the Domain Name System (DNS) for malicious purposes. We develop machine learning models to detect information exfiltration from compromised machines and the establishment of command & control (C&C) servers via tunneling. We validate our approach by experiments where we successfully detect a malware used in several recent Advanced Persistent Threat (APT) attacks [1]. The novelty of our method is its robustness, simplicity, scalability, and ease of deployment in a production environment.