Optimizing Network Structure for 3D Human Pose Estimation

Hai Ci, Chunyu Wang, Xiaoxuan Ma, Yizhou Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2262-2271

Abstract


A human pose is naturally represented as a graph where the joints are the nodes and the bones are the edges. So it is natural to apply Graph Convolutional Network (GCN) to estimate 3D poses from 2D poses. In this work, we propose a generic formulation where both GCN and Fully Connected Network (FCN) are its special cases. From this formulation, we discover that GCN has limited representation power when used for estimating 3D poses. We overcome the limitation by introducing Locally Connected Network (LCN) which is naturally implemented by this generic formulation. It notably improves the representation capability over GCN. In addition, since every joint is only connected to a few joints in its neighborhood, it has strong generalization power. The experiments on public datasets show it: (1) outperforms the state-of-the-arts; (2) is less data hungry than alternative models; (3) generalizes well to unseen actions and datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Ci_2019_ICCV,
author = {Ci, Hai and Wang, Chunyu and Ma, Xiaoxuan and Wang, Yizhou},
title = {Optimizing Network Structure for 3D Human Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}