Robust One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

Oorloff, Trevine; Yacoob, Yaser

Trevine Oorloff, Yaser Yacoob; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 20947-20957

Abstract

Recent research on one-shot face re-enactment has progressively overcome the low-resolution constraint with the help of StyleGAN's high-fidelity portrait generation. However, such approaches rely on explicit 2D/3D structural priors for guidance and/or use flow-based warping which constrain their performance. Moreover, existing methods are sensitive (not robust) to the source frame's facial expressions and head pose, even though ideally only the identity of the source frame should have an effect. Addressing these limitations, we propose a novel framework exploiting the implicit 3D prior and inherent latent properties of StyleGAN2 to facilitate one-shot face re-enactment at 1024x1024 (1) with zero dependencies on explicit structural priors, (2) accommodating attribute edits, and (3) robust to diverse facial expressions and head poses of the source frame. We train an encoder using a self-supervised approach to decompose the identity and facial deformation of a portrait image within the pre-trained StyleGAN2's predefined latent spaces itself (automatically facilitating (1) and (2)). The decomposed identity latent of the source and the facial deformation latents of the driving sequence are used to generate re-enacted frames using the StyleGAN2 generator. Additionally, to improve the identity reconstruction and to enable seamless transfer of driving motion, we propose a novel approach, Cyclic Manifold Adjustment. We perform extensive qualitative and quantitative analyses which demonstrate the superiority of the proposed approach against state-of-the-art methods. Project page: https://trevineoorloff.github.io/FaceVideoReenactment_HybridLatents.io/.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Oorloff_2023_ICCV, author = {Oorloff, Trevine and Yacoob, Yaser}, title = {Robust One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {20947-20957} }