Semantic-Aware Dynamic Parameter for Video Inpainting Transformer

Eunhye Lee, Jinsu Yoo, Yunjeong Yang, Sungyong Baik, Tae Hyun Kim; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 12949-12958

Abstract


Recent learning-based video inpainting approaches have achieved considerable progress. However, they still cannot fully utilize semantic information within the video frames and predict improper scene layout, failing to restore clear object boundaries for mixed scenes. To mitigate this problem, we introduce a new transformer-based video inpainting technique that can exploit semantic information within the input and considerably improve reconstruction quality. In this study, we use the mixture-of-experts scheme and train multiple experts to handle mixed scenes, including various semantics. We leverage these multiple experts and produce locally (token-wise) different network parameters to achieve semantic-aware inpainting results. Extensive experiments on YouTube-VOS and DAVIS benchmark datasets demonstrate that, compared with existing conventional video inpainting approaches, the proposed method has superior performance in synthesizing visually pleasing videos with much clearer semantic structures and textures.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Lee_2023_ICCV, author = {Lee, Eunhye and Yoo, Jinsu and Yang, Yunjeong and Baik, Sungyong and Kim, Tae Hyun}, title = {Semantic-Aware Dynamic Parameter for Video Inpainting Transformer}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {12949-12958} }