Motion-Guided Cascaded Refinement Network for Video Object Segmentation

Ping Hu, Gang Wang, Xiangfei Kong, Jason Kuen, Yap-Peng Tan; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1400-1409


Deep CNNs have achieved superior performance in many tasks of computer vision and image understanding. However, it is still difficult to effectively apply deep CNNs to video object segmentation(VOS) since treating video frames as separate and static will lose the information hidden in motion. To tackle this problem, we propose a Motion-guided Cascaded Refinement Network for VOS. By assuming the object motion is normally different from the background motion, for a video frame we first apply an active contour model on optical flow to coarsely segment objects of interest. Then, the proposed Cascaded Refinement Network(CRN) takes the coarse segmentation as guidance to generate an accurate segmentation of full resolution. In this way, the motion information and the deep CNNs can well complement each other to accurately segment objects from video frames. Furthermore, in CRN we introduce a Single-channel Residual Attention Module to incorporate the coarse segmentation map as attention, making our network effective and efficient in both training and testing. We perform experiments on the popular benchmarks and the results show that our method achieves state-of-the-art performance at a much faster speed.

Related Material

author = {Hu, Ping and Wang, Gang and Kong, Xiangfei and Kuen, Jason and Tan, Yap-Peng},
title = {Motion-Guided Cascaded Refinement Network for Video Object Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}