Motion-Guided Spatial Time Attention for Video Object Segmentation

Qiang Zhou, Zilong Huang, Lichao Huang, Yongchao Gong, Han Shen, Wenyu Liu, Xinggang Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0


In this paper, we propose a novel motion-guided attention module to implant the spatial and time consistency in the correlation map of the current frame with the historical frames. Unlike other mask propagation based methods, our method regards the previous mask as a strong prior instead of concatenating it to the current frame or feature for propagation. Additionally, to reduce the gap between training and testing phase, we propose an improved optimization strategy, named sequence learning, which feeds a video in chronological order into the end-to-end network instead of several random-sampling frames when training. Sequence learning helps our model be better aware of the concept of tracking and recognition of object. We evaluated the proposed algorithm on the second YouTube-VOS test-challenge set and achieved a J&F mean score of 81.7%, ranked the second place on the VOS track. In the challenge, our method only uses ResNet-50 as the backbone and our score is very slightly worse than the first place score, i.e., 0.1%, which implies that our VOS framework is the state-of-the-art one.

Related Material

author = {Zhou, Qiang and Huang, Zilong and Huang, Lichao and Gong, Yongchao and Shen, Han and Liu, Wenyu and Wang, Xinggang},
title = {Motion-Guided Spatial Time Attention for Video Object Segmentation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}