VGGT-ohm

Jianyuan Wang, Minghao Chen, Shangzhan Zhang, Nikita Karaev, Johannes Schönberger, Patrick Labatut, Piotr Bojanowski, David Novotny, Andrea Vedaldi, Christian Rupprecht; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 21486-21499

Abstract


We introduce VGGT-O, a feed-forward model for 3D reconstruction that improves accuracy, efficiency, and capabilities for both static and dynamic scenes. Prior models such as VGGT have shown that feed-forward 3D reconstruction is, in many cases, competitive with traditional optimization-based methods. Here, we show that the accuracy and robustness of these models scale predictably with model capacity and data size. To enable training 3D reconstruction models at an unprecedented scale, we introduce architectural changes that improve training efficiency and scalability, a high-quality data annotation pipeline that supports dynamic scenes, and a self-supervised learning protocol. We significantly simplify VGGT's architecture by using a single dense prediction head with multi-task supervision, removing expensive high-resolution convolutional layers, and introducing efficient scene tokens for feature aggregation in lieu of global attention. These changes allow us to train VGGT-O with 15 x more supervised data than prior work and to leverage vast amounts of unlabeled videos, while requiring only ~30% of VGGT's training memory. VGGT-O achieves strong results for 3D reconstruction of static and dynamic scenes across multiple benchmarks, e.g., improving over the previous best camera estimation accuracy by 77% on Sintel.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wang_2026_CVPR, author = {Wang, Jianyuan and Chen, Minghao and Zhang, Shangzhan and Karaev, Nikita and Sch\"onberger, Johannes and Labatut, Patrick and Bojanowski, Piotr and Novotny, David and Vedaldi, Andrea and Rupprecht, Christian}, title = {VGGT-ohm}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {21486-21499} }