Biblio
Voice-based input is usually used as the primary input method for augmented reality (AR) headsets due to immersive AR experience and good recognition performance. However, recent researches have shown that an attacker can inject inaudible voice commands to the devices that lack voice verification. Even if we secure voice input with voice verification techniques, an attacker can easily steal the victim's voice using low-cast handy recorders and replay it to voice-based applications. To defend against voice-spoofing attacks, AR headsets should be able to determine whether the voice is from the person who is using the AR headsets. Existing voice-spoofing defense systems are designed for smartphone platforms. Due to the special locations of microphones and loudspeakers on AR headsets, existing solutions are hard to be implemented on AR headsets. To address this challenge, in this paper, we propose a voice-spoofing defense system for AR headsets by leveraging both the internal body propagation and the air propagation of human voices. Experimental results show that our system can successfully accept normal users with average accuracy of 97% and defend against two types of attacks with average accuracy of at least 98%.
Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.
Imposters gain unauthorized access to biometric recognition systems using fake biometric data of the legitimate user termed as spoofing. Spoofing of face recognition systems is done by photographs, 3D models and videos of the user. Attack video contains noise from the acquisition process. In this work, we use noise residual content of the video in order to detect spoofed videos. We take advantage of wavelet transform for representing the noise video. Samples of the noise video, termed as visual rhythm image is created for each video. Local Binary Pattern (LBP) and uniform Local Binary Pattern (LBPu2) are extracted from the visual rhythm image followed by classification using Support Vector Machine (SVM). Large size of video from which a number of frames are used for analysis results in huge execution timing. In this work the spoof detection algorithm is applied on various levels of subsections of the video frames resulting in reduced execution timing with reasonable detection accuracies.