MoNet: Deep Motion Exploitation for Video Object Segmentation

Huaxin Xiao, Jiashi Feng, Guosheng Lin, Yu Liu, Maojun Zhang; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1140-1148


In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common contaminating factors, e.g., motion blur, appearance variation and deformation of video objects. Moreover, MoNet exploits motion inconsistency and transforms such motion cue into foreground/background prior to eliminate distraction from confusing instances and noisy regions. By introducing a distance transform layer, MoNet can effectively separate motion-inconstant instances/regions and thoroughly refine segmentation results. Integrating the proposed two motion exploitation components with a standard segmentation network, MoNet provides new state-of-the-art performance on three competitive benchmark datasets.

Related Material

[pdf] [Supp]
author = {Xiao, Huaxin and Feng, Jiashi and Lin, Guosheng and Liu, Yu and Zhang, Maojun},
title = {MoNet: Deep Motion Exploitation for Video Object Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}