DIPNet: Dynamic Identity Propagation Network for Video Object Segmentation

Ping Hu, Jun Liu, Gang Wang, Vitaly Ablavsky, Kate Saenko, Stan Sclaroff; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1904-1913

Abstract


Many recent methods for semi-supervised Video Object Segmentation (VOS) have achieved good performance by exploiting the annotated first frame via one-shot fine-tuning or mask propagation. However, heavily relying on the first frame may weaken the robustness for VOS, since video objects can show large variations through time. In this work, we propose a Dynamic Identity Propagation Network (DIPNet) that adaptively propagates and accurately segments the video objects over time. To achieve this, DIPNet disentangles the VOS task at each time step into a dynamic propagation phase and a spatial segmentation phase. The former utilizes a novel identity representation to adaptively propagate objects' reference information over time, which enhances the robustness to video objects' temporal variations. The latter uses the propagated information to tackle the object segmentation as an easier static image problem that can be optimized via slight fine-tuning on the first frame, thus reducing the computational cost. As a result, by optimizing these two components to complement each other, we can achieve a robust system for VOS. Evaluations on four benchmark datasets show that DIPNet provides state-of-the-art performance with time efficiency.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Hu_2020_WACV,
author = {Hu, Ping and Liu, Jun and Wang, Gang and Ablavsky, Vitaly and Saenko, Kate and Sclaroff, Stan},
title = {DIPNet: Dynamic Identity Propagation Network for Video Object Segmentation},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}