DepthTrack: Unveiling the Power of RGBD Tracking

Song Yan, Jinyu Yang, Jani Käpylä, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10725-10733


RGBD (RGB plus depth) object tracking is gaining momentum as RGBD sensors have become popular in many application fields such as robotics. However, the best RGBD trackers are extensions of the state-of-the-art deep RGB trackers. They are trained with RGB data and the depth channel is used as a sidekick for subtleties such as occlusion detection. This can be explained by the fact that there are no sufficiently large RGBD datasets to 1) train "deep depth trackers" and to 2) challenge RGB trackers with sequences for which the depth cue is essential. This work introduces a new RGBD tracking dataset - DepthTrack - that has twice as many sequences (200) and scene types (40) than in the largest existing dataset, and three times more objects (90). In addition, the average length of the sequences (1473), the number of deformable objects (16) and the number of annotated tracking attributes (15) have been increased. Furthermore, by running the SotA RGB and RGBD trackers on DepthTrack, we propose a new RGBD tracking baseline, namely DeT, which reveals that deep RGBD tracking indeed benefits from genuine training data. The code and dataset is available at

Related Material

[pdf] [supp]
@InProceedings{Yan_2021_ICCV, author = {Yan, Song and Yang, Jinyu and K\"apyl\"a, Jani and Zheng, Feng and Leonardis, Ale\v{s} and K\"am\"ar\"ainen, Joni-Kristian}, title = {DepthTrack: Unveiling the Power of RGBD Tracking}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {10725-10733} }