Emotic Masked Autoencoder on Dual-views with Attention Fusion for Facial Expression Recognition

Xuan-Bach Nguyen, Hoang-Thien Nguyen, Thanh-Huy Nguyen, Nhu-Tai Do, Quang Vinh Dinh; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4784-4792

Abstract


Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets which hampers the generalization capability of expression recognition models is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. By utilizing low-level feature information from the ipsilateral view (auxiliary view) before learning the high-level feature that emphasizes the shift in the human facial expression our work seeks to provide a straightforward yet innovative way to improve the examined view (main view). We also suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model focusing on pivotal local elements. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset as observed in both training and validation contexts.

Related Material


[pdf]
[bibtex]
@InProceedings{Nguyen_2024_CVPR, author = {Nguyen, Xuan-Bach and Nguyen, Hoang-Thien and Nguyen, Thanh-Huy and Do, Nhu-Tai and Dinh, Quang Vinh}, title = {Emotic Masked Autoencoder on Dual-views with Attention Fusion for Facial Expression Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {4784-4792} }