Action Unit Detection With Region Adaptation, Multi-Labeling Learning and Optimal Temporal Fusing

Wei Li, Farnaz Abtahi, Zhigang Zhu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1841-1850

Abstract


Action Unit (AU) detection becomes essential for facial analysis. Many proposed approaches face challenging problems in dealing with the alignments of different face regions, in the effective fusion of temporal information, and in training a model for multiple AU labels. To better address these problems, we propose a deep learning framework for AU detection with region of interest (ROI) adaptation, integrated multi-label learning, and optimal LSTM-based temporal fusing. First, an ROI cropping net is designed to make sure specific interested regions of faces are learned independently; each sub-region has a local convolutional neural network (CNN) whose convolutional filters will only be trained for the corresponding region. Second, multi-label learning is employed to integrate the outputs of those individual ROI cropping nets, which learns the inter-relationships of various AUs and acquires global features across sub-regions for AU detection. Finally, the optimal selection of multiple LSTM layers are carried out to best fuse temporal features, in order to make the AU prediction the most accurate. The proposed approach is evaluated on two popular AU detection datasets, BP4D and DISFA, outperforming the state of the art significantly, with an average improvement of around 13% in BP4D and 25% in DISFA, respectively.

Related Material


[pdf] [arXiv] [poster]
[bibtex]
@InProceedings{Li_2017_CVPR,
author = {Li, Wei and Abtahi, Farnaz and Zhu, Zhigang},
title = {Action Unit Detection With Region Adaptation, Multi-Labeling Learning and Optimal Temporal Fusing},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}