Unsupervised 3D Pose Estimation With Non-Rigid Structure-From-Motion Modeling

Haorui Ji, Hui Deng, Yuchao Dai, Hongdong Li; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 3314-3323

Abstract


Most existing 3D human pose estimation work rely heavily on the powerful memory capability of networks to obtain suitable 2D-3D mappings from the training data. Few works have studied the modeling of human posture deformation in motion. In this paper, we propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton, and a frame-by-frame skeleton deformation. A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence, and then sum them up to obtain the pose of each frame. Subsequently, a loss term based on the diffusion model is used to ensure that the pipeline learns the correct prior motion knowledge. Finally, we have evaluated our proposed method on mainstream datasets and obtained superior results outperforming the state-of-the-art.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Ji_2024_WACV, author = {Ji, Haorui and Deng, Hui and Dai, Yuchao and Li, Hongdong}, title = {Unsupervised 3D Pose Estimation With Non-Rigid Structure-From-Motion Modeling}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {3314-3323} }