Visible to the public Fooling End-To-End Speaker Verification With Adversarial Examples

TitleFooling End-To-End Speaker Verification With Adversarial Examples
Publication TypeConference Paper
Year of Publication2018
AuthorsKreuk, F., Adi, Y., Cisse, M., Keshet, J.
Conference Name2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Keywordsadversarial examples, Automatic speaker verification, automatic speaker verification systems, black-box attacks, composability, deep end-to-end network, end-to-end deep neural models, fooling end-to-end speaker verification, Mel frequency cepstral coefficient, Metrics, MFCC, neural nets, Neural networks, NTIMIT, original speaker examples, Perturbation methods, pubcrawl, resilience, security of data, speaker recognition, Standards, Task Analysis, Training, White Box Security, YOHO
AbstractAutomatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attacks. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable, by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on a deep end-to-end network that was either trained on YOHO or NTIMIT. We also present two black-box attacks. In the first one, we generate adversarial examples with a system trained on NTIMIT and perform the attack on a system that trained on YOHO. In the second one, we generate the adversarial examples with a system trained using Mel-spectrum features and perform the attack on a system trained using MFCCs. Our results show that one can significantly decrease the accuracy of a target system even when the adversarial examples are generated with different system potentially using different features.
DOI10.1109/ICASSP.2018.8462693
Citation Keykreuk_fooling_2018