Unsupervised Object Discovery and Tracking in Video Collections

Suha Kwak, Minsu Cho, Ivan Laptev, Jean Ponce, Cordelia Schmid; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3173-3181

Abstract


This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision. We formulate the problem as a combination of two complementary processes: discovery and tracking. The first one establishes correspondences between prominent regions across videos, and the second one associates similar object regions within the same video. Interestingly, our algorithm also discovers the implicit topology of frames associated with instances of the same object class across different videos, a role normally left to supervisory information in the form of class labels in conventional image and video understanding methods. Indeed, as demonstrated by our experiments, our method can handle video collections featuring multiple object classes, and substantially outperforms the state of the art in colocalization, even though it tackles a broader problem with much less supervision.

Related Material


[pdf]
[bibtex]
@InProceedings{Kwak_2015_ICCV,
author = {Kwak, Suha and Cho, Minsu and Laptev, Ivan and Ponce, Jean and Schmid, Cordelia},
title = {Unsupervised Object Discovery and Tracking in Video Collections},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2015}
}