Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking

Shuiwang Li, Yangxiang Yang, Dan Zeng, Xucheng Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 13989-14000

Abstract


While discriminative correlation filters (DCF)-based trackers prevail in UAV tracking for their favorable efficiency, lightweight convolutional neural network (CNN)-based trackers using filter pruning have also demonstrated remarkable efficiency and precision. However, the use of pure vision transformer models (ViTs) for UAV tracking remains unexplored, which is a surprising finding given that ViTs have been shown to produce better performance and greater efficiency than CNNs in image classification. In this paper, we propose an efficient ViT-based tracking framework, Aba-ViTrack, for UAV tracking. In our framework, feature learning and template-search coupling are integrated into an efficient one-stream ViT to avoid an extra heavy relation modeling module. The proposed Aba-ViT exploits an adaptive and background-aware token computation method to reduce inference time. This approach adaptively discards tokens based on learned halting probabilities, which a priori are higher for background tokens than target ones. Extensive experiments on six UAV tracking benchmarks demonstrate that the proposed Aba-ViTrack achieves state-of-the-art performance in UAV tracking. Code is available at https://github.com/xyyang317/Aba-ViTrack.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Li_2023_ICCV, author = {Li, Shuiwang and Yang, Yangxiang and Zeng, Dan and Wang, Xucheng}, title = {Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {13989-14000} }