Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 3124-3134

Abstract


We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in an image, and then learns a detector on these masks using our robust loss function. We further improve performance by self-training the model on its predictions. Compared to prior work, CutLER is simpler, compatible with different detection architectures, and detects multiple objects. CutLER is also a zero-shot unsupervised detector and improves detection performance AP_50 by over 2.7x on 11 benchmarks across domains like video frames, paintings, sketches, etc. With finetuning, CutLER serves as a low-shot detector surpassing MoCo-v2 by 7.3% AP^box and 6.6% AP^mask on COCO when training with 5% labels.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2023_CVPR, author = {Wang, Xudong and Girdhar, Rohit and Yu, Stella X. and Misra, Ishan}, title = {Cut and Learn for Unsupervised Object Detection and Instance Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {3124-3134} }