PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

Bin Tan, Nan Xue, Song Bai, Tianfu Wu, Gui-Song Xia; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4186-4195


This paper presents a neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image. Different from previous methods, PlaneTR jointly leverages the context information and the geometric structures in a sequence-to-sequence way to holistically detect plane instances in one forward pass. Specifically, we represent the geometric structures as line segments and conduct the network with three main components: (i) context and line segments encoders, (ii) a structure-guided plane decoder, (iii) a pixel-wise plane embedding decoder. Given an image and its detected line segments, PlaneTR generates the context and line segment sequences via two specially designed encoders and then feeds them into a Transformers-based decoder to directly predict a sequence of plane instances by simultaneously considering the context and global structure cues. Finally, the pixel-wise embeddings are computed to assign each pixel to one predicted plane instance which is nearest to it in embedding space. Comprehensive experiments demonstrate that PlaneTR achieves state-of-the-art performance on the ScanNet and NYUv2 datasets.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Tan_2021_ICCV, author = {Tan, Bin and Xue, Nan and Bai, Song and Wu, Tianfu and Xia, Gui-Song}, title = {PlaneTR: Structure-Guided Transformers for 3D Plane Recovery}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {4186-4195} }