Learning Facial Action Units From Web Images With Scalable Weakly Supervised Clustering

Kaili Zhao, Wen-Sheng Chu, Aleix M. Martinez; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2090-2099

Abstract


We present a scalable weakly supervised clustering approach to learn facial action units (AUs) from large, freely available web images. Unlike most existing methods (e.g., CNNs) that rely on fully annotated data, our method exploits web images with inaccurate annotations. Specifically, we derive a weakly-supervised spectral algorithm that learns an embedding space to couple image appearance and semantics. The algorithm has efficient gradient update, and scales up to large quantities of images with a stochastic extension. With the learned embedding space, we adopt rank-order clustering to identify groups of visually and semantically similar images, and re-annotate these groups for training AU classifiers. Evaluation on the 1 millon EmotioNet dataset demonstrates the effectiveness of our approach: (1) our learned annotations reach on average 91.3% agreement with human annotations on 7 common AUs, (2) classifiers trained with re-annotated images perform comparably to, sometimes even better than, its supervised CNN-based counterpart, and (3) our method offers intuitive outlier/noise pruning instead of forcing one annotation to every image. Code is available.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhao_2018_CVPR,
author = {Zhao, Kaili and Chu, Wen-Sheng and Martinez, Aleix M.},
title = {Learning Facial Action Units From Web Images With Scalable Weakly Supervised Clustering},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}