Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking.

Luiten, Jonathon; Torr, Philip; Leibe, Bastian

Jonathon Luiten, Philip Torr, Bastian Leibe; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0

Abstract

Video Instance Segmentation (VIS) is the task of localizing all objects in a video, segmenting them, tracking them throughout the video and classifying them into a set of predefined classes. In this work, divide VIS into these four parts: detection, segmentation, tracking and classification. We then develop algorithms for performing each of these four sub tasks individually, and combine these into a complete solution for VIS. Our solution is an adaptation of UnOVOST, the current best performing algorithm for Unsupervised Video Object Segmentation, to this VIS task. We benchmark our algorithm on the 2019 YouTube-VIS Challenge, where we obtain first place with an mAP score of 46.7%.

Related Material

[pdf]

[bibtex]

@InProceedings{Luiten_2019_ICCV,
author = {Luiten, Jonathon and Torr, Philip and Leibe, Bastian},
title = {Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking.},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}
}