Appending Adversarial Frames for Universal Video Attack

Zhikai Chen, Lingxi Xie, Shanmin Pang, Yong He, Qi Tian; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 3199-3208


This paper investigates the problem of generating adversarial examples for video classification. We project all videos onto a semantic space and a perception space, and point out that adversarial attack is to find a counterpart which is close to the target in the perception space but far from the target in the semantic space. Based on this formulation, we notice that conventional attacking methods mostly used Euclidean distance to measure the perception space, but we propose to make full use of the property of videos and assume a modified video with a few consecutive frames replaced by dummy contents (e.g., a black frame with texts of `thank you for watching' on it) to be close to the original video in the perception space though they have a large Euclidean gap. This leads to a new attack approach which only adds perturbations on the newly-added frames. We show its high success rates in attacking six state-of-the-art video classification networks, as well as its universality, i.e., transferring well across videos and models.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Chen_2021_WACV, author = {Chen, Zhikai and Xie, Lingxi and Pang, Shanmin and He, Yong and Tian, Qi}, title = {Appending Adversarial Frames for Universal Video Attack}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2021}, pages = {3199-3208} }