Multi-Modal Emotion Reaction Intensity Estimation With Temporal Augmentation

Feng Qiu, Bowen Ma, Wei Zhang, Yu Ding; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 5777-5784

Abstract


Emotion reaction intensity (ERI) estimation aims to estimate the emotion intensities of subjects reacting to various video-based stimuli. It plays an important role in human affective behavior analysis. In this paper, we proposed a effective solution for addressing the task of ERI estimation in the fifth Affective Behavior Analysis in the wild (ABAW) competition. Based on multi-modal information, We first extract uni-modal features from images, speeches and texts, respectively and then regress the intensities of 7 emotions. To enhance the model generalization and capture context information, we employ the Temporal Augmentation module to adapt to various video samples and the Temporal SE Block to reweight temporal features adaptively. The extensive experiments conducted on large-scale dataset, Hume-Reaction, demonstrate the effectiveness of our approach. Our method achieves average pearson's correlations coefficient of 0.4160 on the validation set and obtain third place in the ERI Estimation Challenge of ABAW 2023.

Related Material


[pdf]
[bibtex]
@InProceedings{Qiu_2023_CVPR, author = {Qiu, Feng and Ma, Bowen and Zhang, Wei and Ding, Yu}, title = {Multi-Modal Emotion Reaction Intensity Estimation With Temporal Augmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {5777-5784} }