Estimating Body and Hand Motion in an Ego-sensed World

Yi, Brent; Ye, Vickie; Zheng, Maya; Li, Yunqi; Müller, Lea; Pavlakos, Georgios; Ma, Yi; Malik, Jitendra; Kanazawa, Angjoo

Brent Yi, Vickie Ye, Maya Zheng, Yunqi Li, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 7072-7084

Abstract

We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture a device wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve hand estimation: the resulting kinematic and temporal constraints can reduce world-frame errors in single-frame estimates by 40%.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Yi_2025_CVPR, author = {Yi, Brent and Ye, Vickie and Zheng, Maya and Li, Yunqi and M\"uller, Lea and Pavlakos, Georgios and Ma, Yi and Malik, Jitendra and Kanazawa, Angjoo}, title = {Estimating Body and Hand Motion in an Ego-sensed World}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {7072-7084} }