Language-guided Multi-modal Emotional Mimicry Intensity Estimation

Feng Qiu, Wei Zhang, Chen Liu, Lincheng Li, Heming Du, Tianchen Guo, Xin Yu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4742-4751

Abstract


Emotional Mimicry Intensity (EMI) estimation aims to identify the intensity of mimicry exhibited by individuals in response to observed emotions. The challenge in EMI estimation lies in discerning nuanced facial expression cues on mimicry behaviors based on the seed video and the text instructions. In this paper we propose a multi-modal EMI estimation framework by leveraging visual auditory and textual modalities to capture a comprehensive emotional profile. We first extract representations for each modality separately and then fuse the modality-specific representations via a Temporal Segment Network optimizing for temporal coherence and emotional context. Furthermore we find that participants demonstrate notable proficiency in mimicking text instructions yet exhibit less effectiveness in replicating facial expressions and vocal tones. In light of this we design a contrastive learning mechanism to refine the extracted feature based on textual guidance. By doing so features derived from similar text instructions are closely aligned enhancing the estimation of emotional mimicry intensity by leveraging the dominant textual modality. Experiments conducted on the Hume-Vidmimic2 dataset illustrate the effectiveness of our framework in EMI estimation. Our framework is recognized as the leading solution in the Emotional Mimicry Intensity (EMI) Estimation Challenge at the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). More information for the Competition can be found in: \href https://affective-behavior-analysis-in-the-wild.github.io/6th/ 6th ABAW .

Related Material


[pdf]
[bibtex]
@InProceedings{Qiu_2024_CVPR, author = {Qiu, Feng and Zhang, Wei and Liu, Chen and Li, Lincheng and Du, Heming and Guo, Tianchen and Yu, Xin}, title = {Language-guided Multi-modal Emotional Mimicry Intensity Estimation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {4742-4751} }