MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition

Yue Jin, Tianqing Zheng, Chao Gao, Guoqiang Xu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 3597-3602

Abstract


Facial action unit (AU) and basic expression recognition are two basic tasks in the area of human affective behavior analysis. Most of the existing methods are eveloped in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we propose a multi-task and multi-modal sequence network (MTMSN) to mine the relationships between the above two different tasks and effectively utilize both visual and audio information of the video. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.7508 and an expression score of 0.7574 on the validation set.

Related Material


[pdf]
[bibtex]
@InProceedings{Jin_2021_ICCV, author = {Jin, Yue and Zheng, Tianqing and Gao, Chao and Xu, Guoqiang}, title = {MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {3597-3602} }