Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos

Yuan, Chengbo; Chen, Geng; Yi, Li; Gao, Yang

Chengbo Yuan, Geng Chen, Li Yi, Yang Gao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 8863-8874

Abstract

Egocentric videos provide valuable insights into human interactions with the physical world, which has sparked growing interest in the computer vision and robotics communities. A critical challenge in fully understanding the geometry and dynamics of egocentric videos is dense scene reconstruction. However, the lack of high-quality labeled datasets in this field has hindered the effectiveness of current supervised learning methods. In this work, we aim to address this issue by exploring an self-supervised dynamic scene reconstruction approach. We introduce **EgoMono4D**, a novel model that unifies the estimation of multiple variables necessary for *Ego*centric *Mono*cular *4D* reconstruction, including camera intrinsic, camera poses, and video depth, all within a fast feed-forward framework. Starting from pretrained single-frame depth and intrinsic estimation model, we extend it with camera poses estimation and align multi-frame results on large-scale unlabeled egocentric videos. We evaluate EgoMono4D in both in-domain and zero-shot generalization settings, achieving superior performance in dense pointclouds sequence reconstruction compared to all baselines. EgoMono4D represents the first attempt to apply self-supervised learning for pointclouds sequence reconstruction to the label-scarce egocentric field, enabling fast, dense, and generalizable reconstruction. The interactable visualization, code and trained models are released https://egomono4d.github.io/.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Yuan_2025_ICCV, author = {Yuan, Chengbo and Chen, Geng and Yi, Li and Gao, Yang}, title = {Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {8863-8874} }