-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Kolodiazhnyi_2024_CVPR, author = {Kolodiazhnyi, Maxim and Vorontsova, Anna and Konushin, Anton and Rukhovich, Danila}, title = {OneFormer3D: One Transformer for Unified Point Cloud Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {20943-20953} }
OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Abstract
Semantic instance and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified simple and effective model addressing all these tasks jointly. The model named OneFormer3D performs instance and semantic segmentation consistently using a group of learnable kernels where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run so that it achieves top performance on all three segmentation tasks simultaneously. Specifically our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic instance and panoptic segmentation of ScanNet (+21 PQ) ScanNet200 (+3.8 mAP50) and S3DIS (+0.8 mIoU) datasets.
Related Material