EmotiEffNet and Temporal Convolutional Networks in Video-based Facial Expression Recognition and Action Unit Detection

Andrey V. Savchenko, Anna P. Sidorova; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4849-4859

Abstract


This paper examines the video-based facial expression recognition and action unit detection tasks. We propose to use pre-trained EmotiEffNet models for frame-level facial feature extraction and feed them into the Temporal Convolutional Networks to take into account the dynamics of facial expressions. In addition we study the possibility of combining facial processing with audio feature extraction to improve the accuracy of audio-visual expression recognition. Experimental results for two tasks from the sixth Affective Behavior Analysis in-the-Wild challenge demonstrate that our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques. As a result our approach took third place in the action unit detection and fourth place in the expression recognition.

Related Material


[pdf]
[bibtex]
@InProceedings{Savchenko_2024_CVPR, author = {Savchenko, Andrey V. and Sidorova, Anna P.}, title = {EmotiEffNet and Temporal Convolutional Networks in Video-based Facial Expression Recognition and Action Unit Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {4849-4859} }