Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining

Feng Qiu, Heming Du, Wei Zhang, Chen Liu, Lincheng Li, Tianchen Guo, Xin Yu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4733-4741

Abstract


Video-based Compound Expression Recognition (CER) aims to identify compound expressions in everyday interactions per frame. Unlike rapid progress in Facial Expression Recognition (FER) for the basic emotions (e.g. surprised sad and fearful) CER with the compound emotions (e.g. fearfully surprised and sadly fearful) remains underexplored with an evident gap in the availability of substantial datasets. In this paper we design a framework to demonstrate the feasibility of predicting compound expressions in-the-wild without relying on domain-specific supervision. To be specific we first train a model on a large-scale facial dataset using the Masked Autoencoder (MAE) approach to learn comprehensive facial features. Then to tailor it for facial expression analysis we fine-tune the ViT encoder on an Action Unit (AU) detection task. To address the issue of insufficient data we transform the task of recognizing compound emotions into a multi-label recognition task for basic emotions. We train a network by finetuning the pretrained ViT encoder to predict the probability of each basic emotion and then combine these probabilities to arrive at the final prediction for the compound emotions. Experiments conducted on the C-EXPR-DB dataset demonstrate the effectiveness of our framework in the frame-by-frame prediction of compound expressions in-the-wild. Our framework is recognized as the leading solution in the Compound Expression (CE) Recognition Challenge in the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). More information for the Competition can be found in: \href https://affective-behavior-analysis-in-the-wild.github.io/6th/ 6th ABAW .

Related Material


[pdf]
[bibtex]
@InProceedings{Qiu_2024_CVPR, author = {Qiu, Feng and Du, Heming and Zhang, Wei and Liu, Chen and Li, Lincheng and Guo, Tianchen and Yu, Xin}, title = {Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {4733-4741} }