Visible to the public Biblio

Filters: Keyword is speaker recognition  [Clear All Filters]
2023-03-31
Zhang, Junjian, Tan, Hao, Deng, Binyue, Hu, Jiacen, Zhu, Dong, Huang, Linyi, Gu, Zhaoquan.  2022.  NMI-FGSM-Tri: An Efficient and Targeted Method for Generating Adversarial Examples for Speaker Recognition. 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC). :167–174.
Most existing deep neural networks (DNNs) are inexplicable and fragile, which can be easily deceived by carefully designed adversarial example with tiny undetectable noise. This allows attackers to cause serious consequences in many DNN-assisted scenarios without human perception. In the field of speaker recognition, the attack for speaker recognition system has been relatively mature. Most works focus on white-box attacks that assume the information of the DNN is obtainable, and only a few works study gray-box attacks. In this paper, we study blackbox attacks on the speaker recognition system, which can be applied in the real world since we do not need to know the system information. By combining the idea of transferable attack and query attack, our proposed method NMI-FGSM-Tri can achieve the targeted goal by misleading the system to recognize any audio as a registered person. Specifically, our method combines the Nesterov accelerated gradient (NAG), the ensemble attack and the restart trigger to design an attack method that generates the adversarial audios with good performance to attack blackbox DNNs. The experimental results show that the effect of the proposed method is superior to the extant methods, and the attack success rate can reach as high as 94.8% even if only one query is allowed.
2021-07-08
Hou, Dai, Han, Hao, Novak, Ed.  2020.  TAES: Two-factor Authentication with End-to-End Security against VoIP Phishing. 2020 IEEE/ACM Symposium on Edge Computing (SEC). :340—345.
In the current state of communication technology, the abuse of VoIP has led to the emergence of telecommunications fraud. We urgently need an end-to-end identity authentication mechanism to verify the identity of the caller. This paper proposes an end-to-end, dual identity authentication mechanism to solve the problem of telecommunications fraud. Our first technique is to use the Hermes algorithm of data transmission technology on an unknown voice channel to transmit the certificate, thereby authenticating the caller's phone number. Our second technique uses voice-print recognition technology and a Gaussian mixture model (a general background probabilistic model) to establish a model of the speaker to verify the caller's voice to ensure the speaker's identity. Our solution is implemented on the Android platform, and simultaneously tests and evaluates transmission efficiency and speaker recognition. Experiments conducted on Android phones show that the error rate of the voice channel transmission signature certificate is within 3.247 %, and the certificate signature verification mechanism is feasible. The accuracy of the voice-print recognition is 72%, making it effective as a reference for identity authentication.
2019-01-16
Kreuk, F., Adi, Y., Cisse, M., Keshet, J..  2018.  Fooling End-To-End Speaker Verification With Adversarial Examples. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :1962–1966.
Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attacks. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable, by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on a deep end-to-end network that was either trained on YOHO or NTIMIT. We also present two black-box attacks. In the first one, we generate adversarial examples with a system trained on NTIMIT and perform the attack on a system that trained on YOHO. In the second one, we generate the adversarial examples with a system trained using Mel-spectrum features and perform the attack on a system trained using MFCCs. Our results show that one can significantly decrease the accuracy of a target system even when the adversarial examples are generated with different system potentially using different features.
2018-12-10
Schonherr, L., Zeiler, S., Kolossa, D..  2017.  Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). :591–598.

Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.

2018-01-10
Aono, K., Chakrabartty, S., Yamasaki, T..  2017.  Infrasonic scene fingerprinting for authenticating speaker location. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :361–365.
Ambient infrasound with frequency ranges well below 20 Hz is known to carry robust navigation cues that can be exploited to authenticate the location of a speaker. Unfortunately, many of the mobile devices like smartphones have been optimized to work in the human auditory range, thereby suppressing information in the infrasonic region. In this paper, we show that these ultra-low frequency cues can still be extracted from a standard smartphone recording by using acceleration-based cepstral features. To validate our claim, we have collected smartphone recordings from more than 30 different scenes and used the cues for scene fingerprinting. We report scene recognition rates in excess of 90% and a feature set analysis reveals the importance of the infrasonic signatures towards achieving the state-of-the-art recognition performance.