The Role of Data for One-Shot Semantic Segmentation

Timo Luddecke, Alexander Ecker; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 2653-2658

Abstract


In this work we investigate the potential of larger datasets for one-shot semantic segmentation. While computer vision models are often trained on millions of diverse samples, current one-shot semantic segmentation datasets encompass only a small number of samples (Pascal-5i), a small number of classes (Pascal-5i and COCO-20i) or have little variability (FSS-1000). To improve this situation, we introduce LVIS-OneShot, a one-shot variant of the LVIS dataset. With 718 classes and 114,347 images, it exceeds previous datasets substantially in terms of size. By controlled experiments we show that not only the number of images but also the number of different classes is crucial. We analyze transfer learning across common datasets and find that by training on LVIS-OneShot we outperform current state-of-the-art models on Pascal-5i. In particular, we observe that a simple baseline model (MaRF) learns to perform one-shot segmentation when trained on a large dataset although it has a generic architecture without strong inductive biases. Code and dataset are available here: eckerlab.org/code/one-shot-segmentation

Related Material


[pdf]
[bibtex]
@InProceedings{Luddecke_2021_CVPR, author = {Luddecke, Timo and Ecker, Alexander}, title = {The Role of Data for One-Shot Semantic Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {2653-2658} }