Optimizing Network Structure for 3D Human Pose Estimation

Hai Ci, Chunyu Wang, Xiaoxuan Ma, Yizhou Wang; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 2262-2271


A human pose is naturally represented as a graph where the joints are the nodes and the bones are the edges. So it is natural to apply Graph Convolutional Network (GCN) to estimate 3D poses from 2D poses. In this work, we propose a generic formulation where both GCN and Fully Connected Network (FCN) are its special cases. From this formulation, we discover that GCN has limited representation power when used for estimating 3D poses. We overcome the limitation by introducing Locally Connected Network (LCN) which is naturally implemented by this generic formulation. It notably improves the representation capability over GCN. In addition, since every joint is only connected to a few joints in its neighborhood, it has strong generalization power. The experiments on public datasets show it: (1) outperforms the state-of-the-arts; (2) is less data hungry than alternative models; (3) generalizes well to unseen actions and datasets.

Related Material

author = {Ci, Hai and Wang, Chunyu and Ma, Xiaoxuan and Wang, Yizhou},
title = {Optimizing Network Structure for 3D Human Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}