Enhancing Emotion Recognition with Pre-trained Masked Autoencoders and Sequential Learning

Weiwei Zhou, Jiada Lu, Chengkun Ling, Weifeng Wang, Shaowei Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4666-4672


Human emotion recognition plays a pivotal role in facilitating seamless interactions between humans and computers. This paper delineates our methodology in tackling the Valence-Arousal (VA) Estimation Challenge Expression (Expr) Recognition Challenge and Action Unit (AU) Detection Challenge within the ambit of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). Our study advocates a novel approach aimed at refining continuous emotion recognition. We achieve this by first pre-training with Masked Autoencoders (MAE) on facial datasets and then fine-tuning the model on the aff-wild2 dataset which is annotated with expression (Expr) labels. The pre-trained model serves as an adept visual feature extractor thereby enhancing the model's robustness. Furthermore we bolster the performance of continuous emotion recognition by integrating Temporal Convolutional Network (TCN) modules and Transformer Encoder modules into our framework. Our model excels beyond baseline performance securing a commendable 3rd place in the Valence-Arousal Estimation Challenge while also achieving an impressive 2nd place in both the Expression Recognition Challenge and the Action Unit Detection Challenge.

Related Material

@InProceedings{Zhou_2024_CVPR, author = {Zhou, Weiwei and Lu, Jiada and Ling, Chengkun and Wang, Weifeng and Liu, Shaowei}, title = {Enhancing Emotion Recognition with Pre-trained Masked Autoencoders and Sequential Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {4666-4672} }