- [pdf] [supp] [arXiv]
Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking
Segmentation-based tracking is currently a promising tracking paradigm due to the robustness towards non-grid deformations, comparing to the traditional box-based tracking methods. However, existing segmentation-based trackers are insufficient in modeling and exploiting dense pixel-wise correspondence across frames. To overcome these limitations, this paper presents a novel segmentation-based tracking architecture equipped with spatio-appearance memory networks. The appearance memory bank utilizes spatio-temporal non-local similarity to propagate segmentation mask to the current frame, which can effectively capture long-range appearance variations and we further treat discriminative correlation filter as spatial memory bank to store the mapping between feature map and spatial map. Moreover, mutual promotion on dual memory networks greatly boost the overall tracking performance. We further propose a dynamic memory machine (DMM) which employs the Earth Mover's Distance (EMD) to reweight memory samples. Without bells and whistles, our simple-yet-effective tracking architecture sets a new state-of-the-art on six tracking benchmarks. Besides, our approach achieves comparable results on two video object segmentation benchmarks. Code and model are released at https://github.com/phiphiphi31/DMB.