DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Tobias Kirschstein, Simon Giebenhain, Matthias Nießner; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 5481-5492

Abstract


DiffusionAvatars synthesizes a high-fidelity 3D head avatar of a person offering intuitive control over both pose and expression. We propose a diffusion-based neural renderer that leverages generic 2D priors to produce compelling images of faces. For coarse guidance of the expression and head pose we render a neural parametric head model (NPHM) from the target viewpoint which acts as a proxy geometry of the person. Additionally to enhance the modeling of intricate facial expressions we condition DiffusionAvatars directly on the expression codes obtained from NPHM via cross-attention. Finally to synthesize consistent surface details across different viewpoints and expressions we rig learnable spatial features to the head's surface via TriPlane lookup in NPHM's canonical space. We train DiffusionAvatars on RGB videos and corresponding fitted NPHM meshes of a person and test the obtained avatars in both self-reenactment and animation scenarios. Our experiments demonstrate that DiffusionAvatars generates temporally consistent and visually appealing videos for novel poses and expressions of a person outperforming existing approaches.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Kirschstein_2024_CVPR, author = {Kirschstein, Tobias and Giebenhain, Simon and Nie{\ss}ner, Matthias}, title = {DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {5481-5492} }