Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks

Jae Shin Yoon, Francois Rameau, Junsik Kim, Seokju Lee, Seunghak Shin, In So Kweon; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2167-2176

Abstract


We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature compression technique that drastically reduces the memory requirements while maintaining the capability of feature representation. Two-stage training (pre-training and fine-tuning) allows our network to handle any target object regardless of its category (even if the object's type does not belong to the pre-training data) or of variations in its appearance through a video sequence. Experiments on large datasets demonstrate the effectiveness of our model - against related methods - in terms of accuracy, speed, and stability. Finally, we introduce the transferability of our network to different domains, such as the infrared data domain.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Yoon_2017_ICCV,
author = {Shin Yoon, Jae and Rameau, Francois and Kim, Junsik and Lee, Seokju and Shin, Seunghak and So Kweon, In},
title = {Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}