Training-Free Target Emphasis with SAM2 Pseudo-Masks for Robust Single Object Tracking

Lee, Byeongseong; Min, Jihong

Byeongseong Lee, Jihong Min; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2026, pp. 125-131

Abstract

Deploying Single Object Tracking (SOT) in real-world surveillance systems often faces challenges from background clutter and similar distractors that contaminate the target template. Addressing this, we propose a practical, training-free target emphasis pipeline that leverages the strong segmentation foundation model (SAM2) to purify the template without requiring model fine-tuning. While explicit input refinement was previously hindered by the lack of accurate and efficient segmenters, we revisit this strategy by exploiting the zero-shot capability of SAM2 in a realistic pseudo-mask setting. We evaluate our method on two representative transformer-based trackers, ARTrack and OSTrack, demonstrating that our approach is model-agnostic and yields consistent performance gains. Crucially, our empirical analysis reveals a counter-intuitive insight: while hard background removal (using GT masks) destroys essential edge contexts leading to domain shift, mild background scaling (a 0.85) successfully balances noise suppression and context preservation. Furthermore, implemented as a one-shot initialization step, our method offers a plug-and-play solution for robustifying trackers without imposing computational overhead on the real-time tracking loop.

Related Material

[pdf]

[bibtex]

@InProceedings{Lee_2026_WACV, author = {Lee, Byeongseong and Min, Jihong}, title = {Training-Free Target Emphasis with SAM2 Pseudo-Masks for Robust Single Object Tracking}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {March}, year = {2026}, pages = {125-131} }