Self-Supervised 3D Face Reconstruction via Conditional Estimation

Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13289-13298


We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semantically meaningful 3D facial parameters without explicit access to their labels, CEST couples the estimation of different 3D facial parameters by taking their statistical dependency into account. Specifically, the estimation of any 3D facial parameter is not only conditioned on the given image, but also on the facial parameters that have already been derived. Moreover, the reflectance symmetry and consistency among the video frames are adopted to improve the disentanglement of facial parameters. Together with a novel strategy for incorporating the reflectance symmetry and consistency, CEST can be efficiently trained with in-the-wild video clips. Both qualitative and quantitative experiments demonstrate the effectiveness of CEST.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Wen_2021_ICCV, author = {Wen, Yandong and Liu, Weiyang and Raj, Bhiksha and Singh, Rita}, title = {Self-Supervised 3D Face Reconstruction via Conditional Estimation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {13289-13298} }