Visible to the public Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems

TitleTowards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems
Publication TypeConference Paper
Year of Publication2022
AuthorsZhan, Yike, Zheng, Baolin, Wang, Qian, Mou, Ningping, Guo, Binqing, Li, Qi, Shen, Chao, Wang, Cong
Conference Name2022 IEEE International Conference on Multimedia and Expo (ICME)
Keywordsadversarial examples, black-box attacks, composability, Deep Learning, Interpretable deep learning systems, Metrics, Multimedia systems, Neural networks, pubcrawl, Resiliency, security, White Box Security
AbstractRecent works have empirically shown that neural network interpretability is susceptible to malicious manipulations. However, existing attacks against Interpretable Deep Learning Systems (IDLSes) all focus on the white-box setting, which is obviously unpractical in real-world scenarios. In this paper, we make the first attempt to attack IDLSes in the decision-based black-box setting. We propose a new framework called Dual Black-box Adversarial Attack (DBAA) which can generate adversarial examples that are misclassified as the target class, yet have very similar interpretations to their benign cases. We conduct comprehensive experiments on different combinations of classifiers and interpreters to illustrate the effectiveness of DBAA. Empirical results show that in all the cases, DBAA achieves high attack success rates and Intersection over Union (IoU) scores.
DOI10.1109/ICME52920.2022.9859856
Citation Keyzhan_towards_2022