From Image to Video Face Inpainting: Spatial-Temporal Nested GAN (STN-GAN) for Usability Recovery

Yifan Wu, Vivek Singh, Ankur Kapoor; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2396-2405

Abstract


In this paper, we propose to use constrained inpainting methods to recover usability of corrupted images. Here we focus on the example of face images that are masked for privacy protection but complete images are required for further algorithm development. The task is tackled in a progressive manner: 1) the generated images should look realistic; 2) the generated images must satisfy spatial constraints, if available; 3) when applied to video data, temporal consistency should be retained. We first present a spatial inpainting framework to synthesize face images which can incorporate spatial constraints, provided as positions of facial markers and show that it outperforms state-of-the-art methods. Next, we propose Spatial-Temporal Nested GAN (STN-GAN) to adapt image inpainting framework, trained on 200k images, to video data by incorporating temporal information using residual blocks. Experiments on multiple public datasets show STN-GAN attains spatio-temporal consistency effectively and efficiently. Furthermore, we show that spatial constraints can be perturbed to obtain different inpainted results from a single source.

Related Material


[pdf]
[bibtex]
@InProceedings{Wu_2020_WACV,
author = {Wu, Yifan and Singh, Vivek and Kapoor, Ankur},
title = {From Image to Video Face Inpainting: Spatial-Temporal Nested GAN (STN-GAN) for Usability Recovery},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}