RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video Object Segmentation

Youngeun Kim, Seokeon Choi, Hankyeol Lee, Taekyung Kim, Changick Kim; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2057-2065

Abstract


In this paper, we introduce a self-supervised approach for video object segmentation without human labeled data. Specifically, we present Robust Pixel-level Matching Networks (RPM-Net), a novel deep architecture that matches pixels between adjacent frames, using only color information from unlabeled videos for training. Technically, RPM-Net can be separated into two main modules. The embedding module first projects input images into high dimensional embedding space. Then the matching module with deformable convolution layers matches pixels between reference and target frames based on the embedding features. Unlike previous supervised methods using deformable convolution, our matching module adopts deformable convolution to focus on similar features in spatiotemporally neighboring pixels. We further propose an online updating module to refine the segmentation result by transferring knowledge from the given first frame. Also, we carry out comprehensive experiments on three public datasets (i.e., DAVIS-2017, SegTrack-v2, and Youtube- Objects) and achieve state-of-the-art performance on self-supervised video object segmentation.

Related Material


[pdf] [supp] [video]
[bibtex]
@InProceedings{Kim_2020_WACV,
author = {Kim, Youngeun and Choi, Seokeon and Lee, Hankyeol and Kim, Taekyung and Kim, Changick},
title = {RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video Object Segmentation},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}