-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Yang_2021_ICCV, author = {Yang, Charig and Lamdouar, Hala and Lu, Erika and Zisserman, Andrew and Xie, Weidi}, title = {Self-Supervised Video Object Segmentation by Motion Grouping}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {7177-7188} }
Self-Supervised Video Object Segmentation by Motion Grouping
Abstract
Animals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting motion cues, i.e. motion segmentation. To achieve this, we introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background, which can be trained in a self-supervised manner, i.e. without using any manual annotations. Despite using only optical flow, and no appearance information, as input, our approach achieves superior results compared to previous state-of-the-art self-supervised methods on public benchmarks (DAVIS2016, SegTrackv2, FBMS59), while being an order of magnitude faster. On a challenging camouflage dataset (MoCA), we significantly outperform other self-supervised approaches, and are competitive with the top supervised approach, highlighting the importance of motion cues and the potential bias towards appearance in existing video segmentation models.
Related Material