Biblio
Artificial neural networks in general and deep learning networks in particular established themselves as popular and powerful machine learning algorithms. While the often tremendous sizes of these networks are beneficial when solving complex tasks, the tremendous number of parameters also causes such networks to be vulnerable to malicious behavior such as adversarial perturbations. These perturbations can change a model's classification decision. Moreover, while single-step adversaries can easily be transferred from network to network, the transfer of more powerful multi-step adversaries has - usually - been rather difficult.In this work, we introduce a method for generating strong adversaries that can easily (and frequently) be transferred between different models. This method is then used to generate a large set of adversaries, based on which the effects of selected defense methods are experimentally assessed. At last, we introduce a novel, simple, yet effective approach to enhance the resilience of neural networks against adversaries and benchmark it against established defense methods. In contrast to the already existing methods, our proposed defense approach is much more efficient as it only requires a single additional forward-pass to achieve comparable performance results.
With the remarkable success of deep learning, Deep Neural Networks (DNNs) have been applied as dominant tools to various machine learning domains. Despite this success, however, it has been found that DNNs are surprisingly vulnerable to malicious attacks; adding a small, perceptually indistinguishable perturbations to the data can easily degrade classification performance. Adversarial training is an effective defense strategy to train a robust classifier. In this work, we propose to utilize the generator to learn how to create adversarial examples. Unlike the existing approaches that create a one-shot perturbation by a deterministic generator, we propose a recursive and stochastic generator that produces much stronger and diverse perturbations that comprehensively reveal the vulnerability of the target classifier. Our experiment results on MNIST and CIFAR-10 datasets show that the classifier adversarially trained with our method yields more robust performance over various white-box and black-box attacks.
Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 — it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by 10%. Code is available at https://github.com/facebookresearch/ImageNet-Adversarial-Training.
Deep neural networks (DNNs) exhibit excellent performance in machine learning tasks such as image recognition, pattern recognition, speech recognition, and intrusion detection. However, the usage of adversarial examples, which are intentionally corrupted by noise, can lead to misclassification. As adversarial examples are serious threats to DNNs, both adversarial attacks and methods of defending against adversarial examples have been continuously studied. Zero-day adversarial examples are created with new test data and are unknown to the classifier; hence, they represent a more significant threat to DNNs. To the best of our knowledge, there are no analytical studies in the literature of zero-day adversarial examples with a focus on attack and defense methods through experiments using several scenarios. Therefore, in this study, zero-day adversarial examples are practically analyzed with an emphasis on attack and defense methods through experiments using various scenarios composed of a fixed target model and an adaptive target model. The Carlini method was used for a state-of-the-art attack, while an adversarial training method was used as a typical defense method. We used the MNIST dataset and analyzed success rates of zero-day adversarial examples, average distortions, and recognition of original samples through several scenarios of fixed and adaptive target models. Experimental results demonstrate that changing the parameters of the target model in real time leads to resistance to adversarial examples in both the fixed and adaptive target models.
The paper presents a fully automatic end-to-end trainable system to colorize grayscale images. Colorization is a highly under-constrained problem. In order to produce realistic outputs, the proposed approach takes advantage of the recent advances in deep learning and generative networks. To achieve plausible colorization, the paper investigates conditional Wasserstein Generative Adversarial Networks (WGAN) [3] as a solution to this problem. Additionally, a loss function consisting of two classification loss components apart from the adversarial loss learned by the WGAN is proposed. The first classification loss provides a measure of how much the predicted colored images differ from ground truth. The second classification loss component makes use of ground truth semantic classification labels in order to learn meaningful intermediate features. Finally, WGAN training procedure pushes the predictions to the manifold of natural images. The system is validated using a user study and a semantic interpretability test and achieves results comparable to [1] on Imagenet dataset [10].