- [pdf] [supp] [code]
Temporal-aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking
Learning discriminative target-specific feature representation for object localization is the core of the 3D Siamese object tracking algorithms. Current Siamese trackers focus on aggregating the target information from the latest template into the search area for target-specific feature construction, which presents the limited performance in the case of object occlusion or object missing. To this end, in this paper, we propose a novel temporal-aware Siamese tracking framework, where the rich target clue lying in a set of historical templates is integrated into the search area for reliable target-specific feature aggregation. Specifically, our method consists of three modules, including a template set sampling module, a temporal feature enhancement module and a temporal-aware feature aggregation module. In the template set sampling module, an effective scoring network is proposed to evaluate the tracking quality of the template so that the high-quality templates are collected to form the historical template set. Then, with the initial feature embeddings of the historical templates, the temporal feature enhancement module concatenates all template embeddings as a whole and then feeds them into a linear attention module for cross-template feature enhancement. Furthermore, the temporal-aware feature aggregation module aggregates the target clue lying in each template into the search area to construct multiple historical target-specific search-area features. Particularly, we follow the collection orders of the templates to fuse all generated target-specific features via an RNN-based module so that the fusion weight of the previous template information can be discounted to better fit the current tracking state. Finally, we feed the temporal fused target-specific feature into a modified CenterPoint detection head for target position regression. Extensive experiments on KITTI, NuScenes and waymo open datasets show the effectiveness of our proposed method.