Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Yu, Cuican; Lu, Guansong; Zeng, Yihan; Sun, Jian; Liang, Xiaodan; Li, Huibin; Xu, Zongben; Xu, Songcen; Zhang, Wei; Xu, Hang

Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin Li, Zongben Xu, Songcen Xu, Wei Zhang, Hang Xu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 15326-15337

Abstract

Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D face using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and input texts. Besides, we present directional classifier guidance during the inference process, which encourages creativity for out-of-domain generations. Compared to the existing methods, TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D. The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models, demonstrating our superiority in generating realistic and semantic-consistent textures.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Yu_2023_ICCV, author = {Yu, Cuican and Lu, Guansong and Zeng, Yihan and Sun, Jian and Liang, Xiaodan and Li, Huibin and Xu, Zongben and Xu, Songcen and Zhang, Wei and Xu, Hang}, title = {Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {15326-15337} }