Visible to the public Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription

TitleSpoofing detection via simultaneous verification of audio-visual synchronicity and transcription
Publication TypeConference Paper
Year of Publication2017
AuthorsSchonherr, L., Zeiler, S., Kolossa, D.
Conference Name2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Keywordsacoustic coupling, Acoustic signal processing, acoustic speaker recognition systems, audio-visual speaker recognition, audio-visual synchronicity, audio-visual systems, bimodal replay attack, coupled hidden Markov models, Cyber-physical systems, data synchronicity, feature extraction, Hidden Markov models, Human Behavior, liveness detection, multimodal biometrics, pubcrawl, replayed synthesized utterances, resilience, Resiliency, Scalability, security of data, speaker recognition, Speech recognition, spoofing attacks, spoofing detection, Streaming media, text-dependent spoofing detection, Training, visualization
Abstract

Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.

URLhttps://ieeexplore.ieee.org/document/8268990
DOI10.1109/ASRU.2017.8268990
Citation Keyschonherr_spoofing_2017