-
[pdf]
[supp]
[bibtex]@InProceedings{Xie_2024_CVPR, author = {Xie, Fei and Wang, Zhongdao and Ma, Chao}, title = {DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {19113-19124} }
DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
Abstract
Existing Siamese or transformer trackers commonly pose visual object tracking as a one-shot detection problem i.e. locating the target object in a single forward evaluation scheme. Despite the demonstrated success these trackers may easily drift towards distractors with similar appearance due to the single forward evaluation scheme lacking self-correction. To address this issue we cast visual tracking as a point set based denoising diffusion process and propose a novel generative learning based tracker dubbed DiffusionTrack. Our DiffusionTrack possesses two appealing properties: 1) It follows a novel noise-to-target tracking paradigm that leverages multiple denoising diffusion steps to localize the target in a dynamic searching manner per frame. 2) It models the diffusion process using a point set representation which can better handle appearance variations for more precise localization. One side benefit is that DiffusionTrack greatly simplifies the post-processing e.g. removing window penalty scheme. Without bells and whistles our DiffusionTrack achieves leading performance over the state-of-the-art trackers and runs in real-time. The code is in https://github.com/VISION-SJTU/DiffusionTrack.
Related Material