Single-Network Whole-Body Pose Estimation

Hidalgo, Gines; Raaj, Yaadhav; Idrees, Haroon; Xiang, Donglai; Joo, Hanbyul; Simon, Tomas; Sheikh, Yaser

Gines Hidalgo, Yaadhav Raaj, Haroon Idrees, Donglai Xiang, Hanbyul Joo, Tomas Simon, Yaser Sheikh; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6982-6991

Abstract

We present the first single-network approach for 2D whole-body pose estimation, which entails simultaneous localization of body, face, hands, and feet keypoints. Due to the bottom-up formulation, our method maintains constant real-time performance regardless of the number of people in the image. The network is trained in a single stage using multi-task learning, through an improved architecture which can handle scale differences between body/foot and face/hand keypoints. Our approach considerably improves upon OpenPose [??], the only work so far capable of whole-body pose estimation, both in terms of speed and global accuracy. Unlike OpenPose, our method does not need to run an additional network for each hand and face candidate, making it substantially faster for multi-person scenarios. This work directly results in a reduction of computational complexity for applications that require 2D whole-body information (e.g., VR/AR, re-targeting). In addition, it yields higher accuracy, especially for occluded, blurry, and low resolution faces and hands. For code, trained models, and validation benchmarks, visit our project page: https://github.com/CMU-Perceptual-Computing-Lab/openpose_train.

Related Material

[pdf]

[bibtex]

@InProceedings{Hidalgo_2019_ICCV,
author = {Hidalgo, Gines and Raaj, Yaadhav and Idrees, Haroon and Xiang, Donglai and Joo, Hanbyul and Simon, Tomas and Sheikh, Yaser},
title = {Single-Network Whole-Body Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}