Video Object Segmentation Using Global and Instance Embedding Learning

Wenbin Ge, Xiankai Lu, Jianbing Shen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16836-16845

Abstract


In this paper, we propose a feature embedding based video object segmentation (VOS) method which is simple, fast and effective. The current VOS task involves two main challenges: object instance differentiation and cross-frame instance alignment. Most state-of-the-art matching based VOS methods simplify this task into a binary segmentation task and tackle each instance independently. In contrast, we decompose the VOS task into two subtasks: global embedding learning that segments foreground objects of each frame in a pixel-to-pixel manner, and instance feature embedding learning that separates instances. The outputs of these two subtasks are fused to obtain the final instance masks quickly and accurately. Through using the relation among different instances per-frame as well as temporal relation across different frames, the proposed network learns to differentiate multiple instances and associate them properly in one feed-forward manner. Extensive experimental results on the challenging DAVIS and Youtube-VOS datasets show that our method achieves better performances than most counterparts in each case.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Ge_2021_CVPR, author = {Ge, Wenbin and Lu, Xiankai and Shen, Jianbing}, title = {Video Object Segmentation Using Global and Instance Embedding Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {16836-16845} }