Detecting Adversarial Examples for Deep Neural Networks via Layer Directed Discriminative Noise Injection
Title | Detecting Adversarial Examples for Deep Neural Networks via Layer Directed Discriminative Noise Injection |
Publication Type | Conference Paper |
Year of Publication | 2019 |
Authors | Wang, Si, Liu, Wenye, Chang, Chip-Hong |
Conference Name | 2019 Asian Hardware Oriented Security and Trust Symposium (AsianHOST) |
Date Published | dec |
Keywords | adversarial examples, adversarial images, Computer architecture, Computer vision, computer vision tasks, convolutional neural nets, Deep Learning, deep neural networks, discriminative noise injection strategy, distortion, dominant layers, false positive rate, false trust, layer directed discriminative noise, learning (artificial intelligence), machine learning, MobileNet, natural images, natural scenes, Neural networks, noninvasive universal perturbation attack, Perturbation methods, policy-based governance, Policy-Governed Secure Collaboration, pubcrawl, resilience, Resiliency, Scalability, Sensitivity, Training |
Abstract | Deep learning is a popular powerful machine learning solution to the computer vision tasks. The most criticized vulnerability of deep learning is its poor tolerance towards adversarial images obtained by deliberately adding imperceptibly small perturbations to the clean inputs. Such negatives can delude a classifier into wrong decision making. Previous defensive techniques mostly focused on refining the models or input transformation. They are either implemented only with small datasets or shown to have limited success. Furthermore, they are rarely scrutinized from the hardware perspective despite Artificial Intelligence (AI) on a chip is a roadmap for embedded intelligence everywhere. In this paper we propose a new discriminative noise injection strategy to adaptively select a few dominant layers and progressively discriminate adversarial from benign inputs. This is made possible by evaluating the differences in label change rate from both adversarial and natural images by injecting different amount of noise into the weights of individual layers in the model. The approach is evaluated on the ImageNet Dataset with 8-bit truncated models for the state-of-the-art DNN architectures. The results show a high detection rate of up to 88.00% with only approximately 5% of false positive rate for MobileNet. Both detection rate and false positive rate have been improved well above existing advanced defenses against the most practical noninvasive universal perturbation attack on deep learning based AI chip. |
URL | https://ieeexplore.ieee.org/document/9006702 |
DOI | 10.1109/AsianHOST47458.2019.9006702 |
Citation Key | wang_detecting_2019 |
- machine learning
- Training
- Sensitivity
- Scalability
- Resiliency
- resilience
- pubcrawl
- Policy-Governed Secure Collaboration
- policy-based governance
- Perturbation methods
- noninvasive universal perturbation attack
- Neural networks
- natural scenes
- natural images
- MobileNet
- adversarial examples
- learning (artificial intelligence)
- layer directed discriminative noise
- false trust
- false positive rate
- dominant layers
- distortion
- discriminative noise injection strategy
- deep neural networks
- deep learning
- convolutional neural nets
- computer vision tasks
- computer vision
- computer architecture
- adversarial images