Correlation Pyramid Network for 3D Single Object Tracking

Mengmeng Wang, Teli Ma, Xingxing Zuo, Jiajun Lv, Yong Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 3216-3225


In recent years, 3D LiDAR-based single object tracking (SOT) has gained increasing attention as it plays a crucial role in 3D applications such as autonomous driving. The central problem is how to learn a target-aware representation from the sparse and incomplete point clouds. In this paper, we propose a novel Correlation Pyramid Network (CorpNet) with a unified encoder and a motion-factorized decoder. Specifically, the encoder introduces multi-level self attentions and cross attentions in its main branch to enrich the template and search region features and realize their fusion and interaction, respectively. Additionally, considering the sparsity characteristics of the point clouds, we design a lateral correlation pyramid structure for the encoder to keep as many points as possible by integrating hierarchical correlated features. The output features of the search region from the encoder can be directly fed into the decoder for predicting target locations without any extra matcher. Moreover, in the decoder of CorpNet, we disentangle the 3D convolution into successive 2D and 1D convolution blocks and attach a BEV prediction head with an extra z-axis prediction head to explicitly learn the movement of the up axis and the x-y plane together. Extensive experiments on two commonly-used datasets (KITTI and NuScenes) show our CorpNet achieves state-of-the-art results while running in real-time.

Related Material

[pdf] [arXiv]
@InProceedings{Wang_2023_CVPR, author = {Wang, Mengmeng and Ma, Teli and Zuo, Xingxing and Lv, Jiajun and Liu, Yong}, title = {Correlation Pyramid Network for 3D Single Object Tracking}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {3216-3225} }