conSAMme: Achieving Consistent Segmentations with SAM

Myers-Dean, Josh; Liu, Kangning; Price, Brian; Fan, Yifei; Kuen, Jason; Gurari, Danna

Josh Myers-Dean, Kangning Liu, Brian Price, Yifei Fan, Jason Kuen, Danna Gurari; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 759-768

Abstract

Multi-output interactive segmentation methods generate multiple binary masks when given user guidance, such as clicks. However, it is unpredictable whether the order of the masks will match or whether those masks will be the same when given slightly different user guidance. To address these issues, we propose conSAMme, a contrastive learning framework that conditions on explicit hierarchical semantics and leverages weakly supervised part segmentation data and a novel episodic click sampling strategy. Evaluation of conSAMme's performance, click robustness, and mask ordering show substantial improvements to baselines with less than 1% extra training data compared to the amount of data used for the baseline.

Related Material

[pdf]

[bibtex]

@InProceedings{Myers-Dean_2025_CVPR, author = {Myers-Dean, Josh and Liu, Kangning and Price, Brian and Fan, Yifei and Kuen, Jason and Gurari, Danna}, title = {conSAMme: Achieving Consistent Segmentations with SAM}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {759-768} }