Emotion Recognition Based on Body and Context Fusion in the Wild

Yibo Huang, Hongqian Wen, Linbo Qing, Rulong Jin, Leiming Xiao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 3609-3617

Abstract


Emotion recognition in-the-wild under uncontrolled conditions is a challenge, because facial expression is often blurred or even missing in the public space, while the previous visual emotion recognition researches have mainly focused on facial expression. In this paper we present a learning-based algorithm for emotion recognition by utilizing posture and context information, aiming to realize emotion recognition based on video in the wild. The network is designed in a three-branch architecture, including three feature streams: body, skeleton and context streams. The three streams are then fused to predict dimensional emotion representation, valence, arousal, and dominance. In addition, a new Body and Context Emotions Dataset (BCEmotion) is captured in the wild and labeled to support the related research, to tackle the lack of datasets based on public space video including complete individuals with face blurs and occlusions. With the BCEmotion dataset, we trained the proposed model that jointly analyses body and context of videos to realize emotion recognition in the wild. Experimental results show that proposed method effectively integrates emotional information expressed by body and context, and has good generalization ability and applicability in public space video data.

Related Material


[pdf]
[bibtex]
@InProceedings{Huang_2021_ICCV, author = {Huang, Yibo and Wen, Hongqian and Qing, Linbo and Jin, Rulong and Xiao, Leiming}, title = {Emotion Recognition Based on Body and Context Fusion in the Wild}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {3609-3617} }