3D-LFM: Lifting Foundation Model

Mosam Dabhi, László A. Jeni, Simon Lucey; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10466-10475

Abstract


The lifting of a 3D structure and camera from 2D landmarks is at the cornerstone of the discipline of computer vision. Traditional methods have been confined to specific rigid objects such as those in Perspective-n-Point (PnP) problems but deep learning has expanded our capability to reconstruct a wide range of object classes (e.g. C3DPO [??] and PAUL [??]) with resilience to noise occlusions and perspective distortions. However all these techniques have been limited by the fundamental need to establish correspondences across the 3D training data significantly limiting their utility to applications where one has an abundance of "in-correspondence" 3D data. Our approach harnesses the inherent permutation equivariance of transformers to manage varying numbers of points per 3D data instance withstands occlusions and generalizes to unseen categories. We demonstrate state-of-the-art performance across 2D-3D lifting task benchmarks. Since our approach can be trained across such a broad class of structures we refer to it simply as a 3D Lifting Foundation Model (3D-LFM) -- the first of its kind.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Dabhi_2024_CVPR, author = {Dabhi, Mosam and Jeni, L\'aszl\'o A. and Lucey, Simon}, title = {3D-LFM: Lifting Foundation Model}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {10466-10475} }