VisionGuard: Runtime Detection of Adversarial Inputs to Perception Systems
Deep neural networks (DNNs) have been deployed in multiple safety-critical systems, such as medical imaging, autonomous cars, and surveillance systems. At the same time, DNNs have been shown to be vulnerable to adversarial examples [1], i.e., inputs which have deliberately been modified to cause either misclassification or desired incorrect prediction that would benefit an attacker.
In this work, to establish secure and high-confidence DNNbased perception systems, we propose VisionGuard, an attackand dataset-agnostic detection framework for defense against adversarial input images. VisionGuard relies on two key observations. First, adversaries take a real image and generate an attacked image by exploiting the large feature space over which they can look for adversarial inputs. We have validated this observation in our experiments: the larger the input space (i.e., image dimensions), the easier to fool the target classifier.
Second, we observe that the computer vision community has 50+ years experience in designing compression algorithms for real images - not attacked images. Thus, we expect real images to be better reconstructed by lossy compression than attacked images. Utilizing these two observations, VisionGuard effectively shrinks the feature space available to adversaries by processing both the original (possibly attacked) image and a 'refined' version generated through lossy compression (e.g., JPEG) with high compression quality. To determine if an image is adversarial, VisionGuard checks if the softmax output of the target classifier on a given input image changes significantly after feeding it a 'refined' version of that image. In VisionGuard, We measure the similarity of the corresponding softmax outputs using the Kullback - Leibler (K-L) divergence metric. If this metric is above a threshold, the image is classified as adversarial; otherwise, it is classified as clean. Conveniently, VisionGuard does not modify the specific classifier and does not rely on building separate classifiers; as such, the proposed approach can be used in coordination with existing defenses against adversarial examples such as adversarial/robust training [2] and image purification [3].
Similar defenses that rely on image transformations have also been proposed [4], [5]. For instance, MagNet [4], instead of compression algorithms, employs auto-encoders to generate new images that are reconstructed from the original ones.
Nevertheless, MagNet is dataset-specific as it requires training a new autoencoder for each dataset, a task that is particularly challenging and computationally demanding especially for large image domains. Image transformations have also been employed in [5] to detect adversarial inputs but in a completely different way than the proposed one. In particular, [5] relies The authors are with the School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA 19103, USA, fkantaros,carptj,sangdonp,rivanov,lee,weimerjg@seas.upenn.edu. on building a DNN-based detector that takes as input K N features, where K is the number of applied transformations (e.g., rotation and translation) and N is the number of logits/ classes. Note that [5] is significantly more computationally expensive than VisionGuard both at run- and design- time.
Particularly, at design time [5] requires to train a DNN for the dataset of interest and at runtime, it requires application of K image transformations and the last hidden layer output for each transformed image. Alternative detectors have been proposed in [6], [7], that require extracting and storing the last hidden layer output for all training images which is a time-consuming and dataset-specific process and may not be possible on all platforms (e.g., lightweight IoT cameras) due to excessive memory requirements. These embeddings are used at runtime to check if an image is adversarial. Common in the above detector is that they are dataset-specific. Therefore, it is unclear how these methods perform when they are deployed in real-world environments for which datasets do not exist. Also, the above detectors have only been evaluated on small-scale datasets such as MNIST and CIFAR10.
We evaluate VisionGuard on the MNIST, CIFAR10, and ImageNet datasets and we show that, unlike relevant works, it is very computationally light in terms of runtime and memory requirements, even when it is applied to large-scale datasets, such as ImageNet; therefore, it can be employed in real-time applications that may also involve large-scale image spaces.
For instance, the training process of the detectors in [6], [7] for the ImageNet dataset required 38 hours approximately and their performance on ImageNet is comparable to the performance of a random detector.
To the best of our knowledge, VisionGuard is the first attackagnostic and dataset-agnostic detection technique for defense against adversarial examples. Also, VisionGuard is the first detector that scales to large image domains (e.g., ImageNet dateset) while attaining high detection performance under a wide range of attacks - e.g., the area under the Receiver Operating Characteristics curve (AUC) is always greater than 90%.
References:
[1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, "Intriguing properties of neural networks," arXiv preprint arXiv:1312.6199, 2013.
[2] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015.
[3] N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau, "Shield: Fast, practical defense and vaccination for deep learning using jpeg compression," in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018, pp. 196-204.
[4] D. Meng and H. Chen, "Magnet: a two-pronged defense against adversarial examples," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 135-147.
[5] S. Tian, G. Yang, and Y. Cai, "Detecting adversarial examples through image transformation," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[6] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, "Detecting adversarial samples from artifacts," arXiv preprint arXiv:1703.00410, 2017.
[7] T. Pang, C. Du, Y. Dong, and J. Zhu, "Towards robust detection of adversarial examples," in Advances in Neural Information Processing Systems, 2018, pp. 4584-4594.
Yiannis Kantaros received the Diploma in Electrical and Computer Engineering in 2012 from the University of Patras, Patras, Greece. He also received the M.Sc. and the Ph.D. degrees in Mechanical Engineering from Duke University, Durham, NC, in 2017 and 2018, respectively. He is currently a postdoctoral research in the Department of Computer and Information Science at the University of Pennsylvania. His research focuses on distributed control, machine learning and formal methods with applications to distributed robotics. He received the Best Student Paper Award at the 2nd IEEE Global Conference on Signal and Information Processing in 2014 and the 2017-18 Outstanding Dissertation Research Award from the Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC.