-
[pdf]
[bibtex]@InProceedings{Kong_2024_ACCV, author = {Kong, Wenhao and Zhang, Xiaowei}, title = {HT-SSPG:Hierarchical Transformers for Semantic Surface Point Generation in 3D Object Detection}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3672-3688} }
HT-SSPG:Hierarchical Transformers for Semantic Surface Point Generation in 3D Object Detection
Abstract
Currently, the incomplete point cloud structure in LiDAR point clouds has become the primary challenge for improving detector performance. Point cloud completion methods address this issue by adding more points to regions of interest, however, due to imprecise proposals and coarse feature extraction methods, these approaches often generate numerous low-quality points, which limits detection performance. To tackle this issue, we propose a hierarchical transformers for semantic surface point generation in 3D object detection (HT-SSPG), leveraging a voxel supervised network (VSN) and a hierarchical attention refinement (HAR) network to generate high-quality proposals and complete semantic surface points for precise detection. Specifically, the VSN enhances the backbone network's perception of spatial structures using 3D heatmaps, capturing complete structural and positional information of missing objects. The HAR module effectively integrates voxel and point cloud features using cross-attention transformers to accurately estimate the complete shape and position of objects, thus generating high-quality semantic surface points for precise detection. Extensive experiments demonstrate that our HT-SSPG achieves leading performance on the KITTI dataset. Compared to PG-RCNN, our method significantly improves detection accuracy for small objects such as pedestrians and cyclists. Specifically, it outperforms in pedestrian detection by 8.46% AP and 8.08% AP at moderate and hard levels, respectively.
Related Material