Forecasting of 3D Whole-body Human Poses with Grasping Objects

Haitao Yan, Qiongjie Cui, Jiexin Xie, Shijie Guo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 1726-1736

Abstract


In the context of computer vision and human-robot interaction forecasting 3D human poses is crucial for understanding human behavior and enhancing the predictive capabilities of intelligent systems. While existing methods have made significant progress they often focus on predicting major body joints overlooking fine-grained gestures and their interaction with objects. Human hand movements particularly during object interactions play a pivotal role and provide more precise expressions of human poses. This work fills this gap and introduces a novel paradigm: forecasting 3D whole-body human poses with a focus on grasping objects. This task involves predicting activities across all joints in the body and hands encompassing the complexities of internal heterogeneity and external interactivity. To tackle these challenges we also propose a novel approach: C^3HOST cross-context cross-modal consolidation for 3D whole-body pose forecasting effectively handles the complexities of internal heterogeneity and external interactivity. C^3HOST involves distinct steps including the heterogeneous content encoding and alignment and cross-modal feature learning and interaction. These enable us to predict activities across all body and hand joints ensuring high-precision whole-body human pose prediction even during object grasping. Extensive experiments on two benchmarks demonstrate that our model significantly enhances the accuracy of whole-body human motion prediction. The project page is available at https://sites.google.com/view/c3host.

Related Material


[pdf]
[bibtex]
@InProceedings{Yan_2024_CVPR, author = {Yan, Haitao and Cui, Qiongjie and Xie, Jiexin and Guo, Shijie}, title = {Forecasting of 3D Whole-body Human Poses with Grasping Objects}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {1726-1736} }