Learn To Match: Automatic Matching Network Design for Visual Tracking

Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, Weiming Hu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13339-13348

Abstract


Siamese tracking has achieved groundbreaking performance in recent years, where the essence is the efficient matching operator cross-correlation and its variants. Besides the remarkable success, it is important to note that the heuristic matching network design relies heavily on expert experience. Moreover, we experimentally find that one sole matching operator is difficult to guarantee stable tracking in all challenging environments. Thus, in this work, we introduce six novel matching operators, namely Concatenation, Pointwise-Addition, Pairwise-Relation, FiLM, Simple-Transformer and Transductive-Guidance, to explore more feasibility on matching operator selection. The analyses reveal these operators' selective adaptability on different environment degradation types, which inspires us to combine them to explore complementary features. To this end, we propose binary channel manipulation (BCM) to search for the optimal combination of these operators. BCM determines to retrain or discard one operator by learning its contribution to other tracking steps. By inserting the learned matching networks to a strong baseline tracker Ocean, our model achieves favorable gains by 67.2 -> 71.4, 52.6 -> 58.3, 70.3 -> 76.0 AUC on OTB100, LaSOT, and TrackingNet, respectively. Notably, Our tracker runs at real-time speed of 50 / 100 FPS using PyTorch / TensorRT.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhang_2021_ICCV, author = {Zhang, Zhipeng and Liu, Yihao and Wang, Xiao and Li, Bing and Hu, Weiming}, title = {Learn To Match: Automatic Matching Network Design for Visual Tracking}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {13339-13348} }