2D-LFM: Lifting Foundation Model without 3D Supervision

Dabhi, Mosam; Gill, Irhas; Jeni, László A.; Lucey, Simon

Mosam Dabhi, Irhas Gill, László A. Jeni, Simon Lucey; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 34303-34311

Abstract

Recent vision foundation models give the impression that 3D reconstruction from RGB is largely solved. Yet these systems struggle with object-specific 3D structure: the fine-grained geometry implied by an object's landmarks or skeleton. In this paper, we show that when a model is given only 2D landmarks, it can recover more accurate 3D structure than state-of-the-art depth-from-RGB foundation models. Classical lifting approaches such as PAUL demonstrate this principle but do not scale beyond single categories, while methods like 3D-LFM scale but require extensive 3D supervision. We present the first lifting foundation model that learns object-specific 3D geometry using only 2D supervision. The key idea is to inject correspondence structure into the model via a positional encoding inspired by classical structure-from-motion. This simple inductive bias enables robust, object-agnostic 3D lifting that rivals or exceeds recent 3D-supervised approaches, revealing that landmark-based lifting remains a powerful and under-exploited paradigm for 3D understanding.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Dabhi_2026_CVPR, author = {Dabhi, Mosam and Gill, Irhas and Jeni, L\'aszl\'o A. and Lucey, Simon}, title = {2D-LFM: Lifting Foundation Model without 3D Supervision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {34303-34311} }