SWPT: Spherical Window-based Point Cloud Transformer

Xindong Guo, Yu Sun, Rong Zhao, Liqun Kuang, Xie Han; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 3034-3050


While the Transformer architecture has become the de-facto standard for natural language processing tasks and has shown promising prospects in image analysis domains, applying it to the 3D point cloud directly is still a challenge due to the irregularity and lack of order. Most current approaches adopt the farthest point searching as a downsampling method and construct local areas with the k-nearest neighbor strategy to extract features hierarchically. However, this scheme inevitably consumes lots of time and memory, which impedes its application to near-real-time systems and large-scale point cloud. This research designs a novel transformer-based network called Spherical Window-based Point Transformer (SWPT) for point cloud learning, which consists of a Spherical Projection module, a Spherical Window Transformer module and a crossing self-attention module. Specifically, we project the points on a spherical surface, then a window-based local self-attention is adopted to calculate the relationship between the points within a window. To obtain connections between different windows, the crossing self-attention is introduced, which rotates all the windows as a whole along the spherical surface and then aggregates the crossing features. It is inherently permutation invariant because of using simple and symmetric functions, making it suitable for point cloud processing. Extensive experiments demonstrate that SWPT can achieve the state-of-the-art performance with about 3-8 times faster than previous transformer-based methods on shape classification tasks, and achieve competitive results on part segmentation and the more difficult real-world classification tasks.

Related Material

[pdf] [code]
@InProceedings{Guo_2022_ACCV, author = {Guo, Xindong and Sun, Yu and Zhao, Rong and Kuang, Liqun and Han, Xie}, title = {SWPT: Spherical Window-based Point Cloud Transformer}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {3034-3050} }