Classification-Aware Semi-Supervised Domain Adaptation

Gewen He, Xiaofeng Liu, Fangfang Fan, Jane You; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 964-965


Deep neural networks are usually data-starved, but manually annotation can be costly in many specific tasks. For instance, the emotion recognition from the audio. However, there is a large amount of public available labeled image-based facial expression recognition datasets. How could these images help for the audio emotion recognition with limited labeled data according to their inherent correlations can be a meaningful and challenging task. In this paper, we propose a semi-supervised adversarial network that allows the knowledge transfer from the labeled videos to the heterogeneous labeled audio domain hence enhancing the audio emotion recognition performance. Specifically, face image samples are translated to the spectrograms class-wisely. To harness the translated samples in a sparsely distributed area and construct a tighter decision boundary, we propose to precisely estimate the density on feature space and incorporate the reliable low-density sample with an annealing scheme. Moreover, the unlabeled audios are collected with the high-density path in a graph representation. As a possible ""recognition via generation"" framework, we empirically demonstrated its effectiveness on several audio emotional recognition benchmarks. We also demonstrated its generality on recent large-scaled semi-supervised domain adaptation tasks.

Related Material

author = {He, Gewen and Liu, Xiaofeng and Fan, Fangfang and You, Jane},
title = {Classification-Aware Semi-Supervised Domain Adaptation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}