EmotiEffNets for Facial Processing in Video-Based Valence-Arousal Prediction, Expression Classification and Action Unit Detection
In this article, the pre-trained convolutional networks from the EmotiEffNet family for frame-level feature extraction are used for downstream emotion analysis tasks from the fifth Affective Behavior Analysis in-the-wild (ABAW) competition. In particular, we propose an ensemble of a multi-layered perceptron and the LightAutoML-based classifier. The post-processing by smoothing the results for sequential frames is implemented. Experimental results for the large-scale Aff-Wild2 database demonstrate that our model is much better than the baseline facial processing using VGGFace And ResNet. For example, our macro-averaged F1-scores of facial expression recognition and action unit detection on the testing set are 11-13% greater. Moreover, the concordance correlation coefficients for valence/arousal estimation are up to 30% higher when compared to the baseline.