Visible to the public Biblio

Filters: Keyword is data poisoning  [Clear All Filters]
2023-06-22
Seetharaman, Sanjay, Malaviya, Shubham, Vasu, Rosni, Shukla, Manish, Lodha, Sachin.  2022.  Influence Based Defense Against Data Poisoning Attacks in Online Learning. 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS). :1–6.
Data poisoning is a type of adversarial attack on training data where an attacker manipulates a fraction of data to degrade the performance of machine learning model. There are several known defensive mechanisms for handling offline attacks, however defensive measures for online learning, where data points arrive sequentially, have not garnered similar interest. In this work, we propose a defense mechanism to minimize the degradation caused by the poisoned training data on a learner's model in an online setup. Our proposed method utilizes an influence function which is a classic technique in robust statistics. Further, we supplement it with the existing data sanitization methods for filtering out some of the poisoned data points. We study the effectiveness of our defense mechanism on multiple datasets and across multiple attack strategies against an online learner.
ISSN: 2155-2509
2023-01-13
Taneja, Vardaan, Chen, Pin-Yu, Yao, Yuguang, Liu, Sijia.  2022.  When Does Backdoor Attack Succeed in Image Reconstruction? A Study of Heuristics vs. Bi-Level Solution ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :4398—4402.
Recent studies have demonstrated the lack of robustness of image reconstruction networks to test-time evasion attacks, posing security risks and potential for misdiagnoses. In this paper, we evaluate how vulnerable such networks are to training-time poisoning attacks for the first time. In contrast to image classification, we find that trigger-embedded basic backdoor attacks on these models executed using heuristics lead to poor attack performance. Thus, it is non-trivial to generate backdoor attacks for image reconstruction. To tackle the problem, we propose a bi-level optimization (BLO)-based attack generation method and investigate its effectiveness on image reconstruction. We show that BLO-generated back-door attacks can yield a significant improvement over the heuristics-based attack strategy.
2023-01-06
Fan, Jiaxin, Yan, Qi, Li, Mohan, Qu, Guanqun, Xiao, Yang.  2022.  A Survey on Data Poisoning Attacks and Defenses. 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC). :48—55.
With the widespread deployment of data-driven services, the demand for data volumes continues to grow. At present, many applications lack reliable human supervision in the process of data collection, which makes the collected data contain low-quality data or even malicious data. This low-quality or malicious data make AI systems potentially face much security challenges. One of the main security threats in the training phase of machine learning is data poisoning attacks, which compromise model integrity by contaminating training data to make the resulting model skewed or unusable. This paper reviews the relevant researches on data poisoning attacks in various task environments: first, the classification of attacks is summarized, then the defense methods of data poisoning attacks are sorted out, and finally, the possible research directions in the prospect.
Franci, Adriano, Cordy, Maxime, Gubri, Martin, Papadakis, Mike, Traon, Yves Le.  2022.  Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers. 2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN). :77—87.
Graph-based Semi-Supervised Learning (GSSL) is a practical solution to learn from a limited amount of labelled data together with a vast amount of unlabelled data. However, due to their reliance on the known labels to infer the unknown labels, these algorithms are sensitive to data quality. It is therefore essential to study the potential threats related to the labelled data, more specifically, label poisoning. In this paper, we propose a novel data poisoning method which efficiently approximates the result of label inference to identify the inputs which, if poisoned, would produce the highest number of incorrectly inferred labels. We extensively evaluate our approach on three classification problems under 24 different experimental settings each. Compared to the state of the art, our influence-driven attack produces an average increase of error rate 50% higher, while being faster by multiple orders of magnitude. Moreover, our method can inform engineers of inputs that deserve investigation (relabelling them) before training the learning model. We show that relabelling one-third of the poisoned inputs (selected based on their influence) reduces the poisoning effect by 50%. ACM Reference Format: Adriano Franci, Maxime Cordy, Martin Gubri, Mike Papadakis, and Yves Le Traon. 2022. Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers. In 1st Conference on AI Engineering - Software Engineering for AI (CAIN’22), May 16–24, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3522664.3528606
2022-02-09
Cinà, Antonio Emanuele, Vascon, Sebastiano, Demontis, Ambra, Biggio, Battista, Roli, Fabio, Pelillo, Marcello.  2021.  The Hammer and the Nut: Is Bilevel Optimization Really Needed to Poison Linear Classifiers? 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.
One of the most concerning threats for modern AI systems is data poisoning, where the attacker injects maliciously crafted training data to corrupt the system's behavior at test time. Availability poisoning is a particularly worrisome subset of poisoning attacks where the attacker aims to cause a Denial-of-Service (DoS) attack. However, the state-of-the-art algorithms are computationally expensive because they try to solve a complex bi-level optimization problem (the ``hammer''). We observed that in particular conditions, namely, where the target model is linear (the ``nut''), the usage of computationally costly procedures can be avoided. We propose a counter-intuitive but efficient heuristic that allows contaminating the training set such that the target system's performance is highly compromised. We further suggest a re-parameterization trick to decrease the number of variables to be optimized. Finally, we demonstrate that, under the considered settings, our framework achieves comparable, or even better, performances in terms of the attacker's objective while being significantly more computationally efficient.
2021-11-08
Varshney, Kush R..  2020.  On Mismatched Detection and Safe, Trustworthy Machine Learning. 2020 54th Annual Conference on Information Sciences and Systems (CISS). :1–4.
Instilling trust in high-stakes applications of machine learning is becoming essential. Trust may be decomposed into four dimensions: basic accuracy, reliability, human interaction, and aligned purpose. The first two of these also constitute the properties of safe machine learning systems. The second dimension, reliability, is mainly concerned with being robust to epistemic uncertainty and model mismatch. It arises in the machine learning paradigms of distribution shift, data poisoning attacks, and algorithmic fairness. All of these problems can be abstractly modeled using the theory of mismatched hypothesis testing from statistical signal processing. By doing so, we can take advantage of performance characterizations in that literature to better understand the various machine learning issues.