Cross-View Self-Fusion for Self-Supervised 3D Human Pose Estimation in the Wild

Hyun-Woo Kim, Gun-Hee Lee, Myeong-Seok Oh, Seong-Whan Lee; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 1385-1402

Abstract


Human pose estimation methods have recently shown remarkable results with supervised learning that requires large amounts of labeled training data. However, such training data for various human activities does not exist since 3D annotations are acquired with traditional motion capture systems that usually require a controlled indoor environment. To address this issue, we propose a self-supervised approach that learns a monocular 3D human pose estimator from unlabeled multi-view images by using multi-view consistency constraints. Furthermore, we refine inaccurate 2D poses, which adversely affect 3D pose predictions, using the property of canonical space without relying on camera calibration. Since we do not require camera calibrations to leverage the multi-view information, we can train a network from in-the-wild environments. The key idea is to fuse the 2D observations across views and combine predictions from the observations to satisfy the multi-view consistency during training. We outperform state-of-the-art methods in self-supervised learning on the two benchmark datasets Human3.6M and MPI-INF-3DHP as well as on the in-the-wild dataset SkiPose.

Related Material


[pdf] [code]
[bibtex]
@InProceedings{Kim_2022_ACCV, author = {Kim, Hyun-Woo and Lee, Gun-Hee and Oh, Myeong-Seok and Lee, Seong-Whan}, title = {Cross-View Self-Fusion for Self-Supervised 3D Human Pose Estimation in the Wild}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {1385-1402} }