Unsupervised Salient Instance Detection

Xin Tian, Ke Xu, Rynson Lau; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2702-2712

Abstract


The significant amount of manual efforts in annotating pixel-level labels has triggered the advancement of unsupervised saliency learning. However without supervision signals state-of-the-art methods can only infer region-level saliency. In this paper we propose to explore the unsupervised salient instance detection (USID) problem for a more fine-grained visual understanding. Our key observation is that self-supervised transformer features may exhibit local similarities as well as different levels of contrast to other regions which provide informative cues to identify salient instances. Hence we propose SCoCo a novel network that models saliency coherence and contrast for USID. SCoCo includes two novel modules: (1) a global background adaptation (GBA) module with a scene-level contrastive loss to extract salient regions from the scene by searching the adaptive "saliency threshold" in the self-supervised transformer features and (2) a locality-aware similarity (LAS) module with an instance-level contrastive loss to group salient regions into instances by modeling the in-region saliency coherence and cross-region saliency contrasts. Extensive experiments show that SCoCo outperforms state-of-the-art weakly-supervised SID methods and carefully designed unsupervised baselines and has comparable performances to fully-supervised SID methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Tian_2024_CVPR, author = {Tian, Xin and Xu, Ke and Lau, Rynson}, title = {Unsupervised Salient Instance Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {2702-2712} }