Visible to the public Adversarial Video Captioning

TitleAdversarial Video Captioning
Publication TypeConference Paper
Year of Publication2019
AuthorsAdari, Suman Kalyan, Garcia, Washington, Butler, Kevin
Conference Name2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
Keywordsadversarial, adversarial attack, Adversarial Machine Learning, adversarial machine learning techniques, adversarial video captioning, Computer architecture, Computer vision, cosine similarity, Deep Learning, deep learning models, deep video, Force, image captioning attack, image captioning attacks, image domain, learning (artificial intelligence), machine learning, Metrics, Optimization, Perturbation methods, pubcrawl, resilience, Resiliency, Scalability, security of data, Streaming media, target captions, targeted, targeted attacks, Task Analysis, video captioning, video captioning model, video captioning task, video domain, video playback quality, video signal processing, video stream, video streaming
AbstractIn recent years, developments in the field of computer vision have allowed deep learning-based techniques to surpass human-level performance. However, these advances have also culminated in the advent of adversarial machine learning techniques, capable of launching targeted image captioning attacks that easily fool deep learning models. Although attacks in the image domain are well studied, little work has been done in the video domain. In this paper, we show it is possible to extend prior attacks in the image domain to the video captioning task, without heavily affecting the video's playback quality. We demonstrate our attack against a state-of-the-art video captioning model, by extending a prior image captioning attack known as Show and Fool. To the best of our knowledge, this is the first successful method for targeted attacks against a video captioning model, which is able to inject 'subliminal' perturbations into the video stream, and force the model to output a chosen caption with up to 0.981 cosine similarity, achieving near-perfect similarity to chosen target captions.
DOI10.1109/DSN-W.2019.00012
Citation Keyadari_adversarial_2019