Visible to the public Biblio

Filters: Keyword is audio-visual systems  [Clear All Filters]
2021-01-15
Katarya, R., Lal, A..  2020.  A Study on Combating Emerging Threat of Deepfake Weaponization. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :485—490.
A breakthrough in the emerging use of machine learning and deep learning is the concept of autoencoders and GAN (Generative Adversarial Networks), architectures that can generate believable synthetic content called deepfakes. The threat lies when these low-tech doctored images, videos, and audios blur the line between fake and genuine content and are used as weapons to cause damage to an unprecedented degree. This paper presents a survey of the underlying technology of deepfakes and methods proposed for their detection. Based on a detailed study of all the proposed models of detection, this paper presents SSTNet as the best model to date, that uses spatial, temporal, and steganalysis for detection. The threat posed by document and signature forgery, which is yet to be explored by researchers, has also been highlighted in this paper. This paper concludes with the discussion of research directions in this field and the development of more robust techniques to deal with the increasing threats surrounding deepfake technology.
2018-12-10
Schonherr, L., Zeiler, S., Kolossa, D..  2017.  Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). :591–598.

Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.