Biblio
Filters: Keyword is automatic speech recognition [Clear All Filters]
aaeCAPTCHA: The Design and Implementation of Audio Adversarial CAPTCHA. 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). :430–447.
.
2022. CAPTCHAs are designed to prevent malicious bot programs from abusing websites. Most online service providers deploy audio CAPTCHAs as an alternative to text and image CAPTCHAs for visually impaired users. However, prior research investigating the security of audio CAPTCHAs found them highly vulnerable to automated attacks using Automatic Speech Recognition (ASR) systems. To improve the robustness of audio CAPTCHAs against automated abuses, we present the design and implementation of an audio adversarial CAPTCHA (aaeCAPTCHA) system in this paper. The aaeCAPTCHA system exploits audio adversarial examples as CAPTCHAs to prevent the ASR systems from automatically solving them. Furthermore, we conducted a rigorous security evaluation of our new audio CAPTCHA design against five state-of-the-art DNN-based ASR systems and three commercial Speech-to-Text (STT) services. Our experimental evaluations demonstrate that aaeCAPTCHA is highly secure against these speech recognition technologies, even when the attacker has complete knowledge of the current attacks against audio adversarial examples. We also conducted a usability evaluation of the proof-of-concept implementation of the aaeCAPTCHA scheme. Our results show that it achieves high robustness at a moderate usability cost compared to normal audio CAPTCHAs. Finally, our extensive analysis highlights that aaeCAPTCHA can significantly enhance the security and robustness of traditional audio CAPTCHA systems while maintaining similar usability.
Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking. 2022 29th Asia-Pacific Software Engineering Conference (APSEC). :169–178.
.
2022. Automatic speech recognition (ASR) models are used widely in applications for voice navigation and voice control of domestic appliances. ASRs have been misused by attackers to generate malicious outputs by attacking the deep learning component within ASRs. To assess the security and robustnesss of ASRs, we propose techniques within our framework SPAT that generate blackbox (agnostic to the DNN) adversarial attacks that are portable across ASRs. This is in contrast to existing work that focuses on whitebox attacks that are time consuming and lack portability. Our techniques generate adversarial attacks that have no human audible difference by manipulating the input speech signal using a psychoacoustic model that maintains the audio perturbations below the thresholds of human perception. We propose a framework SPAT with three attack generation techniques based on the psychoacoustic concept and frame selection techniques to selectively target the attack. We evaluate portability and effectiveness of our techniques using three popular ASRs and two input audio datasets using the metrics- Word Error Rate (WER) of output transcription, Similarity to original audio, attack Success Rate on different ASRs and Detection score by a defense system. We found our adversarial attacks were portable across ASRs, not easily detected by a state-of the-art defense system, and had significant difference in output transcriptions while sounding similar to original audio.
Generating Audio Adversarial Examples with Ensemble Substituted Models. ICC 2021 - IEEE International Conference on Communications. :1–6.
.
2021. The rapid development of machine learning technology has prompted the applications of Automatic Speech Recognition(ASR). However, studies have shown that the state-of-the-art ASR technologies are still vulnerable to various attacks, which undermines the stability of ASR destructively. In general, most of the existing attack techniques for the ASR model are based on white box scenarios, where the adversary uses adversarial samples to generate a substituted model corresponding to the target model. On the contrary, there are fewer attack schemes in the black-box scenario. Moreover, no scheme considers the problem of how to construct the architecture of the substituted models. In this paper, we point out that constructing a good substituted model architecture is crucial to the effectiveness of the attack, as it helps to generate a more sophisticated set of adversarial examples. We evaluate the performance of different substituted models by comprehensive experiments, and find that ensemble substituted models can achieve the optimal attack effect. The experiment shows that our approach performs attack over 80% success rate (2% improvement compared to the latest work) meanwhile maintaining the authenticity of the original sample well.
A Practical Black-Box Attack Against Autonomous Speech Recognition Model. GLOBECOM 2020 - 2020 IEEE Global Communications Conference. :1–6.
.
2020. With the wild applications of machine learning (ML) technology, automatic speech recognition (ASR) has made great progress in recent years. Despite its great potential, there are various evasion attacks of ML-based ASR, which could affect the security of applications built upon ASR. Up to now, most studies focus on white-box attacks in ASR, and there is almost no attention paid to black-box attacks where attackers can only query the target model to get output labels rather than probability vectors in audio domain. In this paper, we propose an evasion attack against ASR in the above-mentioned situation, which is more feasible in realistic scenarios. Specifically, we first train a substitute model by using data augmentation, which ensures that we have enough samples to train with a small number of times to query the target model. Then, based on the substitute model, we apply Differential Evolution (DE) algorithm to craft adversarial examples and implement black-box attack against ASR models from the Speech Commands dataset. Extensive experiments are conducted, and the results illustrate that our approach achieves untargeted attacks with over 70% success rate while still maintaining the authenticity of the original data well.
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. 2018 IEEE Security and Privacy Workshops (SPW). :1–7.
.
2018. We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.