EVAEF: Ensemble Valence-Arousal Estimation Framework in the Wild

Liu, Xiaolong; Sun, Lei; Jiang, Wenqiang; Zhang, Fengyuan; Deng, Yuanyuan; Huang, Zhaopei; Meng, Liyu; Liu, Yuchen; Liu, Chuanhe

Xiaolong Liu, Lei Sun, Wenqiang Jiang, Fengyuan Zhang, Yuanyuan Deng, Zhaopei Huang, Liyu Meng, Yuchen Liu, Chuanhe Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 5863-5871

Abstract

This paper presents our work to the Valence-Arousal Estimation Challenge of the 5th Affective Behavior Analysis in-the-wild (ABAW) competition. We explore the problems in this VA challenge from three aspects: 1) To obtain efficient and robust feature representations, we explore the role of multiple visual and video feature extractors; 2)Based on multimodal feature representations that fuse the visual and video information, we utilize four types of temporal encoders to capture the temporal context information in the video, including the LSTM, GRU, Transformer based encoder and a combined encoder of Transformer and LSTM; 3) five model ensemble strategies are used to combine multiple results with different model settings. Our system achieves the performance in Concordance Correlation Coefficients (CCC) of 0.6193 for valence, 0.6634 for arousal, and a mean CCC of 0.6414 on the test set, which demonstrates the effectiveness of our proposed method and ranks first place in the challenge.

Related Material

[pdf]

[bibtex]

@InProceedings{Liu_2023_CVPR, author = {Liu, Xiaolong and Sun, Lei and Jiang, Wenqiang and Zhang, Fengyuan and Deng, Yuanyuan and Huang, Zhaopei and Meng, Liyu and Liu, Yuchen and Liu, Chuanhe}, title = {EVAEF: Ensemble Valence-Arousal Estimation Framework in the Wild}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {5863-5871} }