ABAW5 Challenge: A Facial Affect Recognition Approach Utilizing Transformer Encoder and Audiovisual Fusion

Ziyang Zhang, Liuwei An, Zishun Cui, Ao Xu, Tengteng Dong, Yueqi Jiang, Jingyi Shi, Xin Liu, Xiao Sun, Meng Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 5725-5734

Abstract


In this paper, we present our approach to tackling the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). The competition comprises four sub-challenges, namely Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection, and Emotional Reaction Intensity (ERI) Estimation. To address theuse challenges, we leverage state-of-the-art (sota) models to extract robust audio and visual features. Subsequently, these features are fused using a Transformer Encoder for the VA, Expr, and AU sub-challenges, and TEMMA for the ERI sub-challenge. To mitigate the effect of disparate feature dimensions, we introduce an Affine Module to align the features to the same dimension. Overall, our results outperform the baseline by a substantial margin across all four sub-challenges. Specifically, for the VA Estimation sub-challenge, our method attains a mean Concordance Correlation Coefficient (CCC) of 0.5342, ranking fifth overall. For the Expression Classification subchallenge, our approach achieves an average F1 Score of 0.3337, placing fourth overall. For the AU Detection sub-challenge, our method obtains an average F1 Score of 0.4752. Lastly, for the Emotional Reaction Intensity Estimation sub-challenge, our approach yields an average Pearson's correlation coefficient of 0.3968.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2023_CVPR, author = {Zhang, Ziyang and An, Liuwei and Cui, Zishun and Xu, Ao and Dong, Tengteng and Jiang, Yueqi and Shi, Jingyi and Liu, Xin and Sun, Xiao and Wang, Meng}, title = {ABAW5 Challenge: A Facial Affect Recognition Approach Utilizing Transformer Encoder and Audiovisual Fusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {5725-5734} }