Rakin, Adnan Siraj, Chowdhuryy, Md Hafizul Islam, Yao, Fan, Fan, Deliang.
2022.
DeepSteal: Advanced Model Extractions Leveraging Efficient Weight Stealing in Memories. 2022 IEEE Symposium on Security and Privacy (SP). :1157–1174.
Recent advancements in Deep Neural Networks (DNNs) have enabled widespread deployment in multiple security-sensitive domains. The need for resource-intensive training and the use of valuable domain-specific training data have made these models the top intellectual property (IP) for model owners. One of the major threats to DNN privacy is model extraction attacks where adversaries attempt to steal sensitive information in DNN models. In this work, we propose an advanced model extraction framework DeepSteal that steals DNN weights remotely for the first time with the aid of a memory side-channel attack. Our proposed DeepSteal comprises two key stages. Firstly, we develop a new weight bit information extraction method, called HammerLeak, through adopting the rowhammer-based fault technique as the information leakage vector. HammerLeak leverages several novel system-level techniques tailored for DNN applications to enable fast and efficient weight stealing. Secondly, we propose a novel substitute model training algorithm with Mean Clustering weight penalty, which leverages the partial leaked bit information effectively and generates a substitute prototype of the target victim model. We evaluate the proposed model extraction framework on three popular image datasets (e.g., CIFAR-10/100/GTSRB) and four DNN architectures (e.g., ResNet-18/34/Wide-ResNetNGG-11). The extracted substitute model has successfully achieved more than 90% test accuracy on deep residual networks for the CIFAR-10 dataset. Moreover, our extracted substitute model could also generate effective adversarial input samples to fool the victim model. Notably, it achieves similar performance (i.e., 1-2% test accuracy under attack) as white-box adversarial input attack (e.g., PGD/Trades).
ISSN: 2375-1207
Lin, Xuanwei, Dong, Chen, Liu, Ximeng, Zhang, Yuanyuan.
2022.
SPA: An Efficient Adversarial Attack on Spiking Neural Networks using Spike Probabilistic. 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). :366–375.
With the future 6G era, spiking neural networks (SNNs) can be powerful processing tools in various areas due to their strong artificial intelligence (AI) processing capabilities, such as biometric recognition, AI robotics, autonomous drive, and healthcare. However, within Cyber Physical System (CPS), SNNs are surprisingly vulnerable to adversarial examples generated by benign samples with human-imperceptible noise, this will lead to serious consequences such as face recognition anomalies, autonomous drive-out of control, and wrong medical diagnosis. Only by fully understanding the principles of adversarial attacks with adversarial samples can we defend against them. Nowadays, most existing adversarial attacks result in a severe accuracy degradation to trained SNNs. Still, the critical issue is that they only generate adversarial samples by randomly adding, deleting, and flipping spike trains, making them easy to identify by filters, even by human eyes. Besides, the attack performance and speed also can be improved further. Hence, Spike Probabilistic Attack (SPA) is presented in this paper and aims to generate adversarial samples with more minor perturbations, greater model accuracy degradation, and faster iteration. SPA uses Poisson coding to generate spikes as probabilities, directly converting input data into spikes for faster speed and generating uniformly distributed perturbation for better attack performance. Moreover, an objective function is constructed for minor perturbations and keeping attack success rate, which speeds up the convergence by adjusting parameters. Both white-box and black-box settings are conducted to evaluate the merits of SPA. Experimental results show the model's accuracy under white-box attack decreases by 9.2S% 31.1S% better than others, and average success rates are 74.87% under the black-box setting. The experimental results indicate that SPA has better attack performance than other existing attacks in the white-box and better transferability performance in the black-box setting,
Li, Fang-Qi, Wang, Shi-Lin, Zhu, Yun.
2022.
Fostering The Robustness Of White-Box Deep Neural Network Watermarks By Neuron Alignment. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :3049–3053.
The wide application of deep learning techniques is boosting the regulation of deep learning models, especially deep neural networks (DNN), as commercial products. A necessary prerequisite for such regulations is identifying the owner of deep neural networks, which is usually done through the watermark. Current DNN watermarking schemes, particularly white-box ones, are uniformly fragile against a family of functionality equivalence attacks, especially the neuron permutation. This operation can effortlessly invalidate the ownership proof and escape copyright regulations. To enhance the robustness of white-box DNN watermarking schemes, this paper presents a procedure that aligns neurons into the same order as when the watermark is embedded, so the watermark can be correctly recognized. This neuron alignment process significantly facilitates the functionality of established deep neural network watermarking schemes.
Zhan, Yike, Zheng, Baolin, Wang, Qian, Mou, Ningping, Guo, Binqing, Li, Qi, Shen, Chao, Wang, Cong.
2022.
Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems. 2022 IEEE International Conference on Multimedia and Expo (ICME). :1–6.
Recent works have empirically shown that neural network interpretability is susceptible to malicious manipulations. However, existing attacks against Interpretable Deep Learning Systems (IDLSes) all focus on the white-box setting, which is obviously unpractical in real-world scenarios. In this paper, we make the first attempt to attack IDLSes in the decision-based black-box setting. We propose a new framework called Dual Black-box Adversarial Attack (DBAA) which can generate adversarial examples that are misclassified as the target class, yet have very similar interpretations to their benign cases. We conduct comprehensive experiments on different combinations of classifiers and interpreters to illustrate the effectiveness of DBAA. Empirical results show that in all the cases, DBAA achieves high attack success rates and Intersection over Union (IoU) scores.