Quality Assessment for Talking Head Videos via Multi-modal Feature Representation

Mengjing Su, Yi Wang, Tuo Chen, Chunxiao Li, Shuaiyu Zhao, Jiaxin Wen, Chuyi Lin, Sitong Liu, Ningxin Chu, Yu Zhou; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 1414-1420

Abstract


The extensive application prospects of digital humans have brought about unprecedented attention to the quality evaluation of talking head (TH) videos. Most existing TH video quality assessment (THVQA) methods focus on visual distortions while neglecting the impact of audio distortions on overall quality, limiting performance development. To tackle this problem, we propose a no-reference multi-modal feature representation model for THVQA by holistically considering both visual and auditory information. Among them, the visual features are mined from both spatial and temporal domains via window-based self-attention and two-stream structure with different frame rates. Simultaneously, audio clips are first processed through four audio analysis techniques before being fed into a separable convolution. Then, the audio features are input into the Bi-LSTM network. Finally, the extracted multi-modal features are integrated and regressed into a quality score. Experimental results demonstrate the state-of-the-art performance of the proposed method. Also, our method ranks first in the Talking Head track of the NTIRE 2025 XGC Quality Assessment Challenge. This research contributes to the optimization and design of digital human related technologies.

Related Material


[pdf]
[bibtex]
@InProceedings{Su_2025_CVPR, author = {Su, Mengjing and Wang, Yi and Chen, Tuo and Li, Chunxiao and Zhao, Shuaiyu and Wen, Jiaxin and Lin, Chuyi and Liu, Sitong and Chu, Ningxin and Zhou, Yu}, title = {Quality Assessment for Talking Head Videos via Multi-modal Feature Representation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {1414-1420} }