Multi-Modal Emotion Reaction Intensity Estimation With Temporal Augmentation
Emotion reaction intensity (ERI) estimation aims to estimate the emotion intensities of subjects reacting to various video-based stimuli. It plays an important role in human affective behavior analysis. In this paper, we proposed a effective solution for addressing the task of ERI estimation in the fifth Affective Behavior Analysis in the wild (ABAW) competition. Based on multi-modal information, We first extract uni-modal features from images, speeches and texts, respectively and then regress the intensities of 7 emotions. To enhance the model generalization and capture context information, we employ the Temporal Augmentation module to adapt to various video samples and the Temporal SE Block to reweight temporal features adaptively. The extensive experiments conducted on large-scale dataset, Hume-Reaction, demonstrate the effectiveness of our approach. Our method achieves average pearson's correlations coefficient of 0.4160 on the validation set and obtain third place in the ERI Estimation Challenge of ABAW 2023.