Towards Alleviating the Modeling Ambiguity of Unsupervised Monocular 3D Human Pose Estimation

Yu, Zhenbo; Ni, Bingbing; Xu, Jingwei; Wang, Junjie; Zhao, Chenglong; Zhang, Wenjun

Zhenbo Yu, Bingbing Ni, Jingwei Xu, Junjie Wang, Chenglong Zhao, Wenjun Zhang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 8651-8660

Abstract

In this work, we study the ambiguity problem in the task of unsupervised 3D human pose estimation from 2D counterpart. On one hand, without explicit annotation, the scale of 3D pose is difficult to be accurately captured (scale ambiguity). On the other hand, one 2D pose might correspond to multiple 3D gestures, where the lifting procedure is inherently ambiguous (pose ambiguity). Previous methods generally use temporal constraints (e.g., constant bone length and motion smoothness) to alleviate the above issues. However, these methods commonly enforce the outputs to fulfill multiple training objectives simultaneously, which often lead to sub-optimal results. In contrast to the majority of previous works, we propose to split the whole problem into two sub-tasks, i.e., optimizing 2D input poses via a scale estimation module and then mapping optimized 2D pose to 3D counterpart via a pose lifting module. Furthermore, two temporal constraints are proposed to alleviate the scale and pose ambiguity respectively. These two modules are optimized via a iterative training scheme with corresponding temporal constraints, which effectively reduce the learning difficulty and lead to better performance. Results on the Human3.6M dataset demonstrate that our approach improves upon the prior art by 23.1% and also outperforms several weakly supervised approaches that rely on 3D annotations. Our project is available at https://sites.google.com/view/ambiguity-aware-hpe.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Yu_2021_ICCV, author = {Yu, Zhenbo and Ni, Bingbing and Xu, Jingwei and Wang, Junjie and Zhao, Chenglong and Zhang, Wenjun}, title = {Towards Alleviating the Modeling Ambiguity of Unsupervised Monocular 3D Human Pose Estimation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {8651-8660} }