-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Zust_2025_ICCV, author = {Zust, Lojze and Cabon, Yohann and Marrie, Juliette and Antsfeld, Leonid and Chidlovskii, Boris and Revaud, Jerome and Csurka, Gabriela}, title = {PanSt3R: Multi-view Consistent Panoptic Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {5856-5866} }
PanSt3R: Multi-view Consistent Panoptic Segmentation
Abstract
Panoptic segmentation in 3D is a fundamental problem in scene understanding. Existing approaches typically rely on costly test-time optimizations (often based on NeRF) to consolidate 2D predictions of off-the-shelf panoptic segmentation methods into 3D. Instead, in this work, we propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization by jointly predicting 3D geometry and multi-view-consistent panoptic segmentation in a single forward pass. Our approach harnesses the 3D representations of MUSt3R, a recent scalable multi-view version of DUSt3R, and 2D representations of DINOv2, then performs joint multi-view panoptic prediction via a mask transformer architecture. We additionally revisit the standard post-processing mask merging procedure and introduce a more principled approach for multi-view segmentation. We also introduce a simple method for generating novel-view predictions based on the predictions of PanSt3R and vanilla 3DGS. Overall, the proposed PanSt3R is conceptually simple yet fast and scalable, and achieves state-of-the-art performance on several benchmarks, while being orders of magnitude faster.
Related Material