Dense 3D Regression for Hand Pose Estimation

Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5147-5156


We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-arts based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful design choices in pose parameterization, which leverages both 2D and 3D properties of depth map. Specifically, we decompose the pose parameters into a set of per-pixel estimations, i.e., 2D heat maps, 3D heat maps and unit 3D direction vector fields. The 2D/3D joint heat maps and 3D joint offsets are estimated via multi-task network cascades, which is trained end-to-end. The pixel-wise estimations can be directly translated into a vote casting scheme. A variant of mean shift is then used to aggregate local votes and explicitly handles the global 3D estimation in consensus with pixel-wise 2D and 3D estimations. Our method is efficient and highly accurate. On MSRA and NYU hand dataset, our method outperforms all previous state-of-arts by a large margin. On ICVL hand dataset, our method achieves similar accuracy compared to the state-of-art which is nearly saturated and outperforms other state-of-arts. Code will be made available.

Related Material

[pdf] [arXiv]
author = {Wan, Chengde and Probst, Thomas and Van Gool, Luc and Yao, Angela},
title = {Dense 3D Regression for Hand Pose Estimation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}