Biblio

Filters: Author is Lin, Weiran  [Clear All Filters]
2023-01-30
Lin, Weiran, Lucas, Keane, Bauer, Lujo, Reiter, Michael K., Sharif, Mahmood.  2022.  Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks. Proceedings of the 39 th International Conference on Machine Learning.

We propose new, more efficient targeted whitebox attacks against deep neural networks. Our attacks better align with the attacker’s goal: (1) tricking a model to assign higher probability to the target class than to any other class, while (2) staying within an -distance of the attacked input. First, we demonstrate a loss function that explicitly encodes (1) and show that Auto-PGD finds more attacks with it. Second, we propose a new attack method, Constrained Gradient Descent (CGD), using a refinement of our loss function that captures both (1) and (2). CGD seeks to satisfy both attacker objectives—misclassification and bounded `p-norm—in a principled manner, as part of the optimization, instead of via ad hoc postprocessing techniques (e.g., projection or clipping). We show that CGD is more successful on CIFAR10 (0.9–4.2%) and ImageNet (8.6–13.6%) than state-of-the-art attacks while consuming less time (11.4–18.8%). Statistical tests confirm that our attack outperforms others against leading defenses on different datasets and values of .

2022-01-12
Lin, Weiran, Lucas, Keane, Bauer, Lujo, Reiter, Michael K., Sharif, Mahmood.  2021.  Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks.
Minimal adversarial perturbations added to inputs have been shown to be effective at fooling deep neural networks. In this paper, we introduce several innovations that make white-box targeted attacks follow the intuition of the attacker's goal: to trick the model to assign a higher probability to the target class than to any other, while staying within a specified distance from the original input. First, we propose a new loss function that explicitly captures the goal of targeted attacks, in particular, by using the logits of all classes instead of just a subset, as is common. We show that Auto-PGD with this loss function finds more adversarial examples than it does with other commonly used loss functions. Second, we propose a new attack method that uses a further developed version of our loss function capturing both the misclassification objective and the L∞ distance limit ϵ. This new attack method is relatively 1.5--4.2% more successful on the CIFAR10 dataset and relatively 8.2--14.9% more successful on the ImageNet dataset, than the next best state-of-the-art attack. We confirm using statistical tests that our attack outperforms state-of-the-art attacks on different datasets and values of ϵ and against different defenses.