Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription

Submitted by grigby1 on Mon, 12/10/2018 - 11:37am

Title	Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription
Publication Type	Conference Paper
Year of Publication	2017
Authors	Schonherr, L., Zeiler, S., Kolossa, D.
Conference Name	2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Keywords	acoustic coupling, Acoustic signal processing, acoustic speaker recognition systems, audio-visual speaker recognition, audio-visual synchronicity, audio-visual systems, bimodal replay attack, coupled hidden Markov models, Cyber-physical systems, data synchronicity, feature extraction, Hidden Markov models, Human Behavior, liveness detection, multimodal biometrics, pubcrawl, replayed synthesized utterances, resilience, Resiliency, Scalability, security of data, speaker recognition, Speech recognition, spoofing attacks, spoofing detection, Streaming media, text-dependent spoofing detection, Training, visualization
Abstract	Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.
URL	https://ieeexplore.ieee.org/document/8268990
DOI	10.1109/ASRU.2017.8268990
Citation Key	schonherr_spoofing_2017

Groups:

Science of Security VO