Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang; Proceedings of the Asian Conference on Computer Vision (ACCV), 2020


Video object segmentation is the fundamental problem of video analysis and many methods based on mask propagation and matching have been proposed in recent years. However, the two strategies are highly dependent on the last mask or the fixed mask given in the first frame and hence cannot adapt well to high deformation and rapid motion of objects. In this paper, we proposed a novel architecture named Mask-Ranking Network(MRNet), which takes advantage of both the propagation-based method and the matching-based method, to address the above problem. Specifically, in order to make better use of the long-term previous masks, we propose a novel propagation mechanism to make the network comprehensively consider the previous information. Under a unified encoder-decoder framework, we track the pixel-wise similarity of the object activation area in a long-term manner and explore the correlation between frames. In contrast to propagation-based only or matching-based only techniques, our method reduces the accumulation of errors in the propagation process and effectively uses the long-term previous frame information. In the video object segmentation task, MRNet can better handle the deformation of the objects, and make the segmentation result more accurate. We validate the effectiveness of the proposed method on the DAVIS 2016 and DAVIS 2017 dataset. Experiment results show that our method achieve state-of-the-art performance without using online fine-tuning and is robust to long-term propagation.

Related Material

@InProceedings{Li_2020_ACCV, author = {Li, Wenjing and Zhang, Xiang and Hu, Yujie and Tang, Yingqi}, title = {Mask-Ranking Network for Semi-Supervised Video Object Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {November}, year = {2020} }