EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars

Drobyshev, Nikita; Casademunt, Antoni Bigata; Vougioukas, Konstantinos; Landgraf, Zoe; Petridis, Stavros; Pantic, Maja

Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas, Zoe Landgraf, Stavros Petridis, Maja Pantic; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8498-8507

Abstract

Head avatars animated by visual signals have gained popularity particularly in cross-driving synthesis where the driver differs from the animated character a challenging but highly practical approach. The recently presented MegaPortraits model has demonstrated state-of-the-art results in this domain. We conduct a deep examination and evaluation of this model with a particular focus on its latent space for facial expression descriptors and uncover several limitations with its ability to express intense face motions. Head avatars animated by visual signals have gained popularity particularly in cross-driving synthesis where the driver differs from the animated character a challenging but highly practical approach. The recently presented MegaPortraits model has demonstrated state-of-the-art results in this domain. We conduct a deep examination and evaluation of this model with a particular focus on its latent space for facial expression descriptors and uncover several limitations with its ability to express intense face motions. To address these limitations we propose substantial changes in both training pipeline and model architecture to introduce our EMOPortraits model where we: Enhance the model's capability to faithfully support intense asymmetric face expressions setting a new state-of-the-art result in the emotion transfer task surpassing previous methods in both metrics and quality. Incorporate speech-driven mode to our model achieving top-tier performance in audio-driven facial animation making it possible to drive source identity through diverse modalities including visual signal audio or a blend of both.Furthermore we propose a novel multi-view video dataset featuring a wide range of intense and asymmetric facial expressions filling the gap with absence of such data in existing datasets.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Drobyshev_2024_CVPR, author = {Drobyshev, Nikita and Casademunt, Antoni Bigata and Vougioukas, Konstantinos and Landgraf, Zoe and Petridis, Stavros and Pantic, Maja}, title = {EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8498-8507} }