Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds

Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 8417-8427

Abstract


Transformer has demonstrated promising performance in many 2D vision tasks. However, it is cumbersome to apply the self-attention underlying transformer on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space. To solve this issue, existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation. However, the former results in stochastic point dropout, while the latter typically has narrow attention field. In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes. With the VSA module, VoxSeT can manage voxelized point clusters with arbitrary size in a wide range, and process them in parallel with linear complexity. The proposed VoxSeT integrates the high performance of transformer with the efficiency of voxel-based model, which can be used as a good alternative to the convolutional and point-based backbones. VoxSeT reports competitive results on the KITTI and Waymo detection benchmarks. The source code of VoxSeT will be released.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{He_2022_CVPR, author = {He, Chenhang and Li, Ruihuang and Li, Shuai and Zhang, Lei}, title = {Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {8417-8427} }