3D-Object Perception Transformer (3PT)

Kalra, Agastya; Salzmann, Tim; Stoppi, Guy; Marin, Dmitrii; Agarwal, Rishav; Taamazyan, Vage; Bokeloh, Martin; Hinterstoisser, Stefan; Boykov, Anton; Dall'Olio, Alberto; Dangol, Pravin; Venkataraman, Kartik; Chen, Huaijin

Agastya Kalra, Tim Salzmann, Guy Stoppi, Dmitrii Marin, Rishav Agarwal, Vage Taamazyan, Martin Bokeloh, Stefan Hinterstoisser, Anton Boykov, Alberto Dall'Olio, Pravin Dangol, Kartik Venkataraman, Huaijin Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 25777-25787

Abstract

Current approaches to zero-shot 3D-object perception typically rely on ensembles of frozen foundation models. This limits deep object understanding and cross-domain generalization, making performance inadequate for real-world deployment. The 3D-Object Perception Transformer (3PT) addresses this limitation by unifying detection, segmentation, and 6DoF pose estimation in a single framework, directly trained for 3D-object perception. Based on two large-scale trained transformers that specialize in 2D and 3D object-centric scene understanding respectively, 3PT continuously refines its object representations without depth input, enhancing 3D understanding by incorporating multi-view information. 3PT is the state-of-the-art for detection and pose estimation on the BOP benchmarks, often achieving double digit improvements, in many cases, outperforming non-zero-shot methods, and winning 7 of 11 tracks in the BOP-2025 challenge. 3PT surpasses task-specialized models for detection and pose estimation, often achieving double-digit percentage improvements on the diverse BOP-benchmarks, and in some cases outperforming non zero-shot methods. It also ranked first in 7 of 11 tracks at the BOP Challenge 2025. 3PT's high-accuracy and reliability is well-suited for practical industrial robotics applications such as bin picking and precise insertion. Project Page can be found at https://www.intrinsic.ai/publications/3pt-cvpr2026.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Kalra_2026_CVPR, author = {Kalra, Agastya and Salzmann, Tim and Stoppi, Guy and Marin, Dmitrii and Agarwal, Rishav and Taamazyan, Vage and Bokeloh, Martin and Hinterstoisser, Stefan and Boykov, Anton and Dall'Olio, Alberto and Dangol, Pravin and Venkataraman, Kartik and Chen, Huaijin}, title = {3D-Object Perception Transformer (3PT)}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {25777-25787} }