Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation

Xiumei Xie, Zikai Huang, Wenhao Xu, Peng Xiao, Xuemiao Xu, Huaidong Zhang; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 5467-5476

Abstract


Singing is a vital form of human emotional expression and social interaction, distinguished from speech by its richer emotional nuances and freer expressive style. Thus, investigating 3D facial animation driven by singing holds significant research value. Our work focuses on 3D singing facial animation driven by mixed singing audio, and to the best of our knowledge, no prior studies have explored this area. Additionally, the absence of existing 3D singing datasets poses a considerable challenge. To address this, we collect a novel audiovisual dataset, ChorusHead which features synchronized mixed vocal audio and pseudo-3D flame motions for chorus singing. In addition, We propose a partner-aware 3D chorus head generation framework driven by mixed audio inputs. The proposed framework extracts emotional features from the background music and dependence between singers and models the head movement in a latent space from the Variational Autoencoder (VAE), enabling diverse interactive head animation generation. Extensive experimental results demonstrate that our approach effectively generates 3D facial animations of interacting singers, achieving notable improvements in realism and handling background music interference with strong robustness.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Xie_2025_CVPR, author = {Xie, Xiumei and Huang, Zikai and Xu, Wenhao and Xiao, Peng and Xu, Xuemiao and Zhang, Huaidong}, title = {Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {5467-5476} }