Visible to the public Biblio

Filters: Keyword is adversarial perturbations  [Clear All Filters]
2021-01-15
Gandhi, A., Jain, S..  2020.  Adversarial Perturbations Fool Deepfake Detectors. 2020 International Joint Conference on Neural Networks (IJCNN). :1—8.
This work uses adversarial perturbations to enhance deepfake images and fool common deepfake detectors. We created adversarial perturbations using the Fast Gradient Sign Method and the Carlini and Wagner L2 norm attack in both blackbox and whitebox settings. Detectors achieved over 95% accuracy on unperturbed deepfakes, but less than 27% accuracy on perturbed deepfakes. We also explore two improvements to deep-fake detectors: (i) Lipschitz regularization, and (ii) Deep Image Prior (DIP). Lipschitz regularization constrains the gradient of the detector with respect to the input in order to increase robustness to input perturbations. The DIP defense removes perturbations using generative convolutional neural networks in an unsupervised manner. Regularization improved the detection of perturbed deepfakes on average, including a 10% accuracy boost in the blackbox case. The DIP defense achieved 95% accuracy on perturbed deepfakes that fooled the original detector while retaining 98% accuracy in other cases on a 100 image subsample.
2020-12-01
Usama, M., Asim, M., Latif, S., Qadir, J., Ala-Al-Fuqaha.  2019.  Generative Adversarial Networks For Launching and Thwarting Adversarial Attacks on Network Intrusion Detection Systems. 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC). :78—83.

Intrusion detection systems (IDSs) are an essential cog of the network security suite that can defend the network from malicious intrusions and anomalous traffic. Many machine learning (ML)-based IDSs have been proposed in the literature for the detection of malicious network traffic. However, recent works have shown that ML models are vulnerable to adversarial perturbations through which an adversary can cause IDSs to malfunction by introducing a small impracticable perturbation in the network traffic. In this paper, we propose an adversarial ML attack using generative adversarial networks (GANs) that can successfully evade an ML-based IDS. We also show that GANs can be used to inoculate the IDS and make it more robust to adversarial perturbations.

2020-09-04
Song, Chengru, Xu, Changqiao, Yang, Shujie, Zhou, Zan, Gong, Changhui.  2019.  A Black-Box Approach to Generate Adversarial Examples Against Deep Neural Networks for High Dimensional Input. 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC). :473—479.
Generating adversarial samples is gathering much attention as an intuitive approach to evaluate the robustness of learning models. Extensive recent works have demonstrated that numerous advanced image classifiers are defenseless to adversarial perturbations in the white-box setting. However, the white-box setting assumes attackers to have prior knowledge of model parameters, which are generally inaccessible in real world cases. In this paper, we concentrate on the hard-label black-box setting where attackers can only pose queries to probe the model parameters responsible for classifying different images. Therefore, the issue is converted into minimizing non-continuous function. A black-box approach is proposed to address both massive queries and the non-continuous step function problem by applying a combination of a linear fine-grained search, Fibonacci search, and a zeroth order optimization algorithm. However, the input dimension of a image is so high that the estimation of gradient is noisy. Hence, we adopt a zeroth-order optimization method in high dimensions. The approach converts calculation of gradient into a linear regression model and extracts dimensions that are more significant. Experimental results illustrate that our approach can relatively reduce the amount of queries and effectively accelerate convergence of the optimization method.
Taori, Rohan, Kamsetty, Amog, Chu, Brenton, Vemuri, Nikita.  2019.  Targeted Adversarial Examples for Black Box Audio Systems. 2019 IEEE Security and Privacy Workshops (SPW). :15—20.
The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity, with 35% targeted attack success rate, after 3000 generations while maintaining 94.6% audio file similarity.
Usama, Muhammad, Qayyum, Adnan, Qadir, Junaid, Al-Fuqaha, Ala.  2019.  Black-box Adversarial Machine Learning Attack on Network Traffic Classification. 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC). :84—89.

Deep machine learning techniques have shown promising results in network traffic classification, however, the robustness of these techniques under adversarial threats is still in question. Deep machine learning models are found vulnerable to small carefully crafted adversarial perturbations posing a major question on the performance of deep machine learning techniques. In this paper, we propose a black-box adversarial attack on network traffic classification. The proposed attack successfully evades deep machine learning-based classifiers which highlights the potential security threat of using deep machine learning techniques to realize autonomous networks.

2020-05-22
Dubey, Abhimanyu, Maaten, Laurens van der, Yalniz, Zeki, Li, Yixuan, Mahajan, Dhruv.  2019.  Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :8759—8768.
A plethora of recent work has shown that convolutional networks are not robust to adversarial images: images that are created by perturbing a sample from the data distribution as to maximize the loss on the perturbed example. In this work, we hypothesize that adversarial perturbations move the image away from the image manifold in the sense that there exists no physical process that could have produced the adversarial image. This hypothesis suggests that a successful defense mechanism against adversarial images should aim to project the images back onto the image manifold. We study such defense mechanisms, which approximate the projection onto the unknown image manifold by a nearest-neighbor search against a web-scale image database containing tens of billions of images. Empirical evaluations of this defense strategy on ImageNet suggest that it very effective in attack settings in which the adversary does not have access to the image database. We also propose two novel attack methods to break nearest-neighbor defense settings and show conditions under which nearest-neighbor defense fails. We perform a series of ablation experiments, which suggest that there is a trade-off between robustness and accuracy between as we use features from deeper in the network, that a large index size (hundreds of millions) is crucial to get good performance, and that careful construction of database is crucial for robustness against nearest-neighbor attacks.
2020-04-17
Xie, Cihang, Wu, Yuxin, Maaten, Laurens van der, Yuille, Alan L., He, Kaiming.  2019.  Feature Denoising for Improving Adversarial Robustness. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :501—509.

Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 — it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by 10%. Code is available at https://github.com/facebookresearch/ImageNet-Adversarial-Training.