Fast Video Object Segmentation via Dynamic Targeting Network

Lu Zhang, Zhe Lin, Jianming Zhang, Huchuan Lu, You He; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 5582-5591


We propose a new model for fast and accurate video object segmentation. It consists of two convolutional neural networks, a Dynamic Targeting Network (DTN) and a Mask Refinement Network (MRN). DTN locates the object by dynamically focusing on regions of interest surrounding the target object. The target region is predicted by DTN via two sub-streams, Box Propagation (BP) and Box Re-identification (BR). The BP stream is faster but less effective at objects with large deformation or occlusion. The BR stream performs better in difficult scenarios at a higher computation cost. We propose a Decision Module (DM) to adaptively determine which sub-stream to use for each frame. Finally, MRN is exploited to predict segmentation within the target region. Experimental results on two public datasets demonstrate that the proposed model significantly outperforms existing methods without online training in both accuracy and efficiency, and is comparable to online training-based methods in accuracy with an order of magnitude faster speed.

Related Material

author = {Zhang, Lu and Lin, Zhe and Zhang, Jianming and Lu, Huchuan and He, You},
title = {Fast Video Object Segmentation via Dynamic Targeting Network},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}