GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Xu, Tian-Xing; Gao, Xiangjun; Hu, Wenbo; Li, Xiaoyu; Zhang, Song-Hai; Shan, Ying

Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 6632-6644

Abstract

Despite remarkable advancements in video depth estimation, existing methods fall short in geometric fidelity due to their affine-invariant predictions, restricting their applicability in reconstruction and other metrically grounded downstream tasks. We propose a novel point map Variational Autoencoder (VAE) for encoding and decoding unbounded point maps. Notably, its latent space is agnostic to video latent distributions of video diffusion models, allowing us to leverage generation priors to model the distribution of point map sequences conditioned on the input videos. Thus, we can recover high-fidelity point map sequences with temporal coherence from open-world videos, facilitating accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. Extensive evaluations on diverse datasets demonstrate that our method achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Xu_2025_ICCV, author = {Xu, Tian-Xing and Gao, Xiangjun and Hu, Wenbo and Li, Xiaoyu and Zhang, Song-Hai and Shan, Ying}, title = {GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {6632-6644} }