Leveraging the Power of Data Augmentation for Transformer-Based Tracking

Jie Zhao, Johan Edstedt, Michael Felsberg, Dong Wang, Huchuan Lu; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 6469-6478

Abstract


Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance. Previous works focus on designing effective architectures suited for tracking, but ignore that data augmentation is equally crucial for training a well-performing model. In this paper, we first explore the impact of general data augmentations on transformer-based trackers via systematic experiments, and reveal the limited effectiveness of these common strategies. Motivated by experimental observations, we then propose two data augmentation methods customized for tracking. First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples. Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference. Extensive experiments on two transformer-based trackers and six benchmarks demonstrate the effectiveness and data efficiency of our methods, especially under challenging settings, like one-shot tracking and small image resolutions. Code is available at https://github.com/zj5559/DATr.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhao_2024_WACV, author = {Zhao, Jie and Edstedt, Johan and Felsberg, Michael and Wang, Dong and Lu, Huchuan}, title = {Leveraging the Power of Data Augmentation for Transformer-Based Tracking}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {6469-6478} }