Analysis of Emotion Annotation Strength Improves Generalization in Speech Emotion Recognition Models

Joao Palotti, Gagan Narula, Lekan Raheem, Herbert Bay; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 5829-5837

Abstract


Recent advances in speech emotion recognition (SER) have relied on a mix of acted and in-the-wild research datasets. It is unclear whether annotations in these datasets are of similar strength or quality, can reliably be detected by other human annotators, and to what extent emotion classification knowledge can be transferred between acted and in-the-wild data. A well known, large in-the-wild dataset for emotion classification and sentiment analysis is the CMU-MOSEI video dataset. The raw annotations of CMU-MOSEI are "soft labels" on a Likert scale. Usually, experiments are performed with a simple binarization of these fine-grained labels. In this work, we re-annotated 1% of the data from two acted and two in-the-wild datasets to analyze the strength of emotion annotation per label, compare annotation accuracy between acted and in-the-wild data, and identify an appropriate threshold for CMU-MOSEI label binarization. We report a significant improvement (7% increase on weighted average F1) using the same model architecture in emotion classification by simply identifying a better threshold for CMU-MOSEI. Further, we show that emotion annotation strength of acted and in-the-wild data is similar, and that the same model architecture generalizes to the same extent when trained on acted and tested on in-the-wild data, and vice-versa.

Related Material


[pdf]
[bibtex]
@InProceedings{Palotti_2023_CVPR, author = {Palotti, Joao and Narula, Gagan and Raheem, Lekan and Bay, Herbert}, title = {Analysis of Emotion Annotation Strength Improves Generalization in Speech Emotion Recognition Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {5829-5837} }