Back to Optimization: Diffusion-Based Zero-Shot 3D Human Pose Estimation

Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 6142-6152

Abstract


Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge of learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE 51.4mm without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 42.6mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Jiang_2024_WACV, author = {Jiang, Zhongyu and Zhou, Zhuoran and Li, Lei and Chai, Wenhao and Yang, Cheng-Yen and Hwang, Jenq-Neng}, title = {Back to Optimization: Diffusion-Based Zero-Shot 3D Human Pose Estimation}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {6142-6152} }