3D Prompt Learning for RGB-D Tracking

Bocen Li, Yunzhi Zhuge, Shan Jiang, Lijun Wang, Yifan Wang, Huchuan Lu; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 2527-2544

Abstract


Due to the remarkable advancements in RGB visual tracking, there has been a growing interest in RGB-D tracking, owing to its robust performance even in challenging scenarios. To bridge the gap between RGB and RGB-D tracking, several 2D prompt learning methods have emerged, which primarily target on downstream task adaptation. In contrast, we introduce a novel prompt learning method for RGB-D tracking, termed as 3D Prompt Tracking (3DPT), which is able to capture essential 3D geometric information and transform base RGB trackers into RGB-D trackers through parameter efficient tuning. Compared to those counterparts using depth maps as 2D prompts, we propose to directly encode 3D features from point clouds into base models, leading to more superior discriminative powers, particularly when the target and background distractors share similar visual appearance. We achieve this goal through an elaborately designed Geometry Prompt (GP) block, which can effectively extract 3D features, and inject the 3D knowledge into the 2D base model. The GP block is generally applicable to recent visual trackers, yielding more robust tracking performance with reasonable computational overhead. Extensive experiments demonstrate that our 3D Prompt Tracking delivers promising performance and can generalize across three popular RGB-D tracking datasets, including DepthTrack, CDTB, and VOT-RGBD2022.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Li_2024_ACCV, author = {Li, Bocen and Zhuge, Yunzhi and Jiang, Shan and Wang, Lijun and Wang, Yifan and Lu, Huchuan}, title = {3D Prompt Learning for RGB-D Tracking}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {2527-2544} }