Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement

Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 777-787

Abstract


In this work we explore egocentric whole-body motion capture using a single fisheye camera which simultaneously estimates human body and hand motion. This task presents significant challenges due to three factors: the lack of high-quality datasets fisheye camera distortion and human body self-occlusion. To address these challenges we propose a novel approach that leverages FisheyeViT to extract fisheye image features which are subsequently converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction. For hand tracking we incorporate dedicated hand detection and hand pose estimation networks for regressing 3D hand poses. Finally we develop a diffusion-based whole-body motion prior model to refine the estimated whole-body motion while accounting for joint uncertainties. To train these networks we collect a large synthetic dataset EgoWholeBody comprising 840000 high-quality egocentric images captured across a diverse range of whole-body motion sequences. Quantitative and qualitative evaluations demonstrate the effectiveness of our method in producing high-quality whole-body motion estimates from a single egocentric camera.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Jian and Cao, Zhe and Luvizon, Diogo and Liu, Lingjie and Sarkar, Kripasindhu and Tang, Danhang and Beeler, Thabo and Theobalt, Christian}, title = {Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {777-787} }