Noisy Annotations Robust Consensual Collaborative Affect Expression Recognition

Gera, Darshan; Balasubramanian, S.

Darshan Gera, S. Balasubramanian; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 3585-3592

Abstract

Noisy annotation of large scale facial expression datasets has been a key challenge towards Facial Expression Recognition (FER) in the wild via deep learning. During early learning stage, deep networks fit on clean data and then eventually start overfitting on noisy labels due to their memorization ability which limits FER performance. To overcome this challenge on Aff-Wild2, this paper uses a robust end-to-end Consensual Collaborative Training (CCT) framework. CCT co-trains three networks jointly using a convex combination of supervision loss and consistency loss. A dynamic balancing scheme is used to transition from supervision loss in the initial learning to consistency loss during the later stage. During the initial training, supervision loss is given higher weight thus implicitly learning from clean samples. As the training progresses, consistency loss based on the consensus of predictions among different networks is used to effectively learn from all the samples, thus preventing overfitting to noisy annotated samples. Further, CCT does not make any assumption about the noise rate. Effectiveness of CCT is demonstrated on challenging Aff-Wild2 dataset using various quantitative evaluations and various ablation studies.

Related Material

[pdf]

[bibtex]

@InProceedings{Gera_2021_ICCV, author = {Gera, Darshan and Balasubramanian, S.}, title = {Noisy Annotations Robust Consensual Collaborative Affect Expression Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {3585-3592} }