Light3R-SfM: Towards Feed-forward Structure-from-Motion

Elflein, Sven; Zhou, Qunjie; Leal-Taixé, Laura

Sven Elflein, Qunjie Zhou, Laura Leal-Taixé; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 16774-16784

Abstract

We present Light3R-SfM, a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion (SfM) from unconstrained image collections. Unlike existing SfM solutions that rely on costly matching and global optimization to achieve accurate 3D reconstructions, Light3R-SfM addresses this limitation through a novel latent global alignment module. This module replaces traditional global optimization with a learnable attention mechanism, effectively capturing multi-view constraints across images for robust and precise camera pose estimation. Light3R-SfM constructs a sparse scene graph via retrieval-score-guided shortest path tree to dramatically reduce memory usage and computational overhead compared to the naive approach. Extensive experiments demonstrate that Light3R-SfM achieves competitive accuracy while significantly reducing runtime, making it ideal for 3D reconstruction tasks in real-world applications with a runtime constraint.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Elflein_2025_CVPR, author = {Elflein, Sven and Zhou, Qunjie and Leal-Taix\'e, Laura}, title = {Light3R-SfM: Towards Feed-forward Structure-from-Motion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {16774-16784} }