VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale

Elflein, Sven; Li, Ruilong; Agostinho, Sérgio; Gojcic, Zan; Leal-Taixé, Laura; Zhou, Qunjie; Osep, Aljosa

Sven Elflein, Ruilong Li, Sérgio Agostinho, Zan Gojcic, Laura Leal-Taixé, Qunjie Zhou, Aljosa Osep; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 36464-36474

Abstract

We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements grow quadratically w.r.t. the number of input images. Our approach is built on the key insight that this bottleneck stems from the varying-length Key-Value (KV) space representation of scene geometry, which we distill into a fixed-size Multi-Layer Perceptron (MLP) via test-time training. Our VGG-T^3 (Visual Geometry Grounded Test Time Training) scales linearly w.r.t. the number of input views, similar to online models, and reconstructs a 1k image collection in just 54 seconds, achieving a 11.6xspeed-up over baselines that rely on softmax attention. Since our method retains global scene aggregation capability, our point map reconstruction error outperforming other linear-time methods by large margins. Finally, we demonstrate visual localization capabilities of our model by querying the scene representation with unseen images.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Elflein_2026_CVPR, author = {Elflein, Sven and Li, Ruilong and Agostinho, S\'ergio and Gojcic, Zan and Leal-Taix\'e, Laura and Zhou, Qunjie and Osep, Aljosa}, title = {VGG-T\${\textasciicircum}3\$: Offline Feed-Forward 3D Reconstruction at Scale}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {36464-36474} }