Video-Based Frame-Level Facial Analysis of Affective Behavior on Mobile Devices Using EfficientNets
In this paper, we consider the problem of real-time video-based facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. The predictions for sequential frames are smoothed using mean or median filters. It is demonstrated that our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.1-0.5 higher performance measures for test sets in the uni-task Expression Classification, Valence-Arousal Estimation, Action Unit Detection and Multi-Task Learning. Our team took the 3rd place in the multi-task learning challenge and 4th places in Valence-Arousal and Expression challenges. Due to simplicity, the proposed approach may be considered as a new baseline for all four sub-challenges.