Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

Yu, Bingyao; Li, Wanhua; Li, Xiu; Lu, Jiwen; Zhou, Jie

Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, Jie Zhou; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 8188-8197

Abstract

In this paper, we propose a frequency-aware spatiotemporal transformers for deep In this paper, we propose a Frequency-Aware Spatiotemporal Transformer (FAST) for video inpainting detection, which aims to simultaneously mine the traces of video inpainting from spatial, temporal, and frequency domains. Unlike existing deep video inpainting detection methods that usually rely on hand-designed attention modules and memory mechanism, the proposed FAST have innate global self-attention mechanisms to capture the long-range relations. While existing video inpainting methods usually explore the spatial and temporal connections in a video, our method employs a spatiotemporal transformer framework to detect the spatial connections between patches and temporal dependency between frames. As the inpainted videos usually lack high frequency details, the proposed FAST simultaneously exploits the frequency domain information with a specifically designed decoder. Extensive experimental results demonstrate that our approach achieves very competitive performance and generalizes well.

Related Material

[pdf]

[bibtex]

@InProceedings{Yu_2021_ICCV, author = {Yu, Bingyao and Li, Wanhua and Li, Xiu and Lu, Jiwen and Zhou, Jie}, title = {Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {8188-8197} }