Superior and pragmatic talking face generation with teacher-student framework

Chao Liang, Jianwen Jiang, Tianyun Zhong, Gaojie Lin, Zhengkun Rong, Yongming Zhu, Jiaqi Yang, Xin Chen; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 2035-2044

Abstract


Talking head generation creates talking videos from arbitrary appearance and motion signals. Existing methods perform well with standard inputs but suffer serious performance degradation with complex real-world examples. Moreover, efficiency is also an important concern in deployment. To comprehensively address these issues, we introduce SuperFace, a teacher-student framework that balances quality, robustness, cost and editability. We first propose a simple but effective 3D-aware teacher model capable of handling inputs of varying qualities and generating high-quality results. Building on this, we devise an efficient distillation strategy to transfer the 3D knowledge into an identity-specific 2D student model. This approach maintains generation quality while significantly reducing computational load. Experiments validate that SuperFace offers a more comprehensive solution for above objectives, especially in reducing FLOPs by 99% with the student model. SuperFace supports driven-settings including audio, video and hybrid, allowing for localized facial attributes editing.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Liang_2025_ICCV, author = {Liang, Chao and Jiang, Jianwen and Zhong, Tianyun and Lin, Gaojie and Rong, Zhengkun and Zhu, Yongming and Yang, Jiaqi and Chen, Xin}, title = {Superior and pragmatic talking face generation with teacher-student framework}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2035-2044} }