DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

Zeng, Xiaohui; Liao, Renjie; Gu, Li; Xiong, Yuwen; Fidler, Sanja; Urtasun, Raquel

Xiaohui Zeng, Renjie Liao, Li Gu, Yuwen Xiong, Sanja Fidler, Raquel Urtasun; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3929-3938

Abstract

In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals as a linear assignment problem where thA heading inside a blocke cost matrix is predicted by a deep convolutional neural network. We propose a differentiable matching layer which unrolls a projected gradient descent algorithm in which the projection step exploits the Dykstra's algorithm. We prove that under mild conditions, the matching is guaranteed to converge to the optimal one. In practice, it achieves similar performance compared to the Hungarian algorithm during inference. Meanwhile, we can back-propagate through it to learn the cost matrix. After matching, a U-Net style architecture is exploited to refine the matched mask per time step. On DAVIS 2017 dataset, DMM-Net achieves the best performance without online learning on the first frames and the 2nd best with it. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. At last, our differentiable matching layer is very simple to implement; we attach the PyTorch code in the supplementary material which is less than 50 lines long.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zeng_2019_ICCV,
author = {Zeng, Xiaohui and Liao, Renjie and Gu, Li and Xiong, Yuwen and Fidler, Sanja and Urtasun, Raquel},
title = {DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}