Learning Instance Segmentation by Interaction

Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 2042-2045


We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. Data collection by interaction is natural and a noisy source of information. We propose a robust set loss to deal with noisy training signal and provide a benchmark dataset comprising robot interactions with few human labeled examples for future research to build upon. We provide evidence that re-organization of visual observations into objects is a powerful representation for downstream vision-based control tasks. Our system is capable of rearranging multiple objects into target configurations from visual inputs alone. Full paper available at https://pathak22.github.io

Related Material

[pdf] [arXiv]
author = {Pathak, Deepak and Shentu, Yide and Chen, Dian and Agrawal, Pulkit and Darrell, Trevor and Levine, Sergey and Malik, Jitendra},
title = {Learning Instance Segmentation by Interaction},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2018}