-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Galanakis_2025_WACV, author = {Galanakis, Stathis and Lattas, Alexandros and Moschoglou, Stylianos and Zafeiriou, Stefanos}, title = {FitDiff: Robust Monocular 3D Facial Shape and Reflectance Estimation using Diffusion Models}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {992-1004} }
FitDiff: Robust Monocular 3D Facial Shape and Reflectance Estimation using Diffusion Models
Abstract
The remarkable progress in 3D face reconstruction has resulted in high-detail and photorealistic facial representations. Recently Diffusion Models have revolutionized the capabilities of generative methods by surpassing the performance of GANs. In this work we present FitDiff a diffusion-based 3D facial avatar generative model. Leveraging diffusion principles our model accurately generates relightable facial avatars utilizing an identity embedding extracted from an "in-the-wild" 2D facial image. The introduced multi-modal diffusion model concurrently outputs facial reflectance maps (diffuse and specular albedo and normals) and shapes showcasing great generalization capabilities. It is solely trained on an annotated subset of a public facial dataset paired with 3D reconstructions. We revisit the typical 3D facial fitting approach by guiding a reverse diffusion process using perceptual and face recognition losses. Being the first LDM conditioned on face recognition embeddings FitDiff reconstructs relightable human avatars that can be used as-is in common rendering engines starting only from an unconstrained facial image and achieving state-of-the-art performance.
Related Material