A Latent Transformer for Disentangled Face Editing in Images and Videos

Xu Yao, Alasdair Newson, Yann Gousseau, Pierre Hellier; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13789-13798

Abstract


High quality facial image editing is a challenging problem in the movie post-production industry, requiring a high degree of control and identity preservation. Previous works that attempt to tackle this problem may suffer from the entanglement of facial attributes and the loss of the person's identity. Furthermore, many algorithms are limited to a certain task. To tackle these limitations, we propose to edit facial attributes via the latent space of a StyleGAN generator, by training a dedicated latent transformation network and incorporating explicit disentanglement and identity preservation terms in the loss function. We further introduce a pipeline to generalize our face editing to videos. Our model achieves a disentangled, controllable, and identity-preserving facial attribute editing, even in the challenging case of real (i.e., non-synthetic) images and videos. We conduct extensive experiments on image and video datasets and show that our model outperforms other state-of-the-art methods in visual quality and quantitative evaluation. Source codes are available at https://github.com/InterDigitalInc/latent-transformer.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yao_2021_ICCV, author = {Yao, Xu and Newson, Alasdair and Gousseau, Yann and Hellier, Pierre}, title = {A Latent Transformer for Disentangled Face Editing in Images and Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {13789-13798} }