Biblio
Filters: Keyword is Interpretable deep learning systems [Clear All Filters]
Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems. 2022 IEEE International Conference on Multimedia and Expo (ICME). :1–6.
.
2022. Recent works have empirically shown that neural network interpretability is susceptible to malicious manipulations. However, existing attacks against Interpretable Deep Learning Systems (IDLSes) all focus on the white-box setting, which is obviously unpractical in real-world scenarios. In this paper, we make the first attempt to attack IDLSes in the decision-based black-box setting. We propose a new framework called Dual Black-box Adversarial Attack (DBAA) which can generate adversarial examples that are misclassified as the target class, yet have very similar interpretations to their benign cases. We conduct comprehensive experiments on different combinations of classifiers and interpreters to illustrate the effectiveness of DBAA. Empirical results show that in all the cases, DBAA achieves high attack success rates and Intersection over Union (IoU) scores.