-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Aneja_2024_CVPR, author = {Aneja, Shivangi and Thies, Justus and Dai, Angela and Nie{\ss}ner, Matthias}, title = {FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {21263-21273} }
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Abstract
We introduce FaceTalk a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive detailed nature of human heads including hair ears and finer-scale eye movements we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity temporally coherent motion sequences. We propose a new latent diffusion model for this task operating in the expression space of neural parametric head models to synthesize audio-driven realistic head sequences. In the absence of a dataset with corresponding NPHM expressions to audio we optimize for these correspondences to produce a dataset of temporally-optimized NPHM expressions fit to audio-video recordings of people talking. To the best of our knowledge this is the first work to propose a generative approach for realistic and high-quality motion synthesis of volumetric human heads representing a significant advancement in the field of audio-driven 3D animation. Notably our approach stands out in its ability to generate plausible motion sequences that can produce high-fidelity head animation coupled with the NPHM shape space. Our experimental results substantiate the effectiveness of FaceTalk consistently achieving superior and visually natural motion encompassing diverse facial expressions and styles outperforming existing methods by 75% in perceptual user study evaluation
Related Material