Fine-Grained Head Pose Estimation Without Keypoints

Nataniel Ruiz, Eunji Chong, James M. Rehg; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 2074-2083


Estimating the head pose of a person is a crucial prob- lem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the tar- get face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detec- tion performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to deter- mine pose by training a multi-loss convolutional neural net- work on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) di- rectly from image intensities through joint binned pose clas- sification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose meth- ods. We open-source our training and testing code as well as release our pre-trained models.

Related Material

[pdf] [arXiv]
author = {Ruiz, Nataniel and Chong, Eunji and Rehg, James M.},
title = {Fine-Grained Head Pose Estimation Without Keypoints},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2018}