conSAMme: Achieving Consistent Segmentations with SAM

Josh Myers-Dean, Kangning Liu, Brian Price, Yifei Fan, Jason Kuen, Danna Gurari; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 759-768

Abstract


Multi-output interactive segmentation methods generate multiple binary masks when given user guidance, such as clicks. However, it is unpredictable whether the order of the masks will match or whether those masks will be the same when given slightly different user guidance. To address these issues, we propose conSAMme, a contrastive learning framework that conditions on explicit hierarchical semantics and leverages weakly supervised part segmentation data and a novel episodic click sampling strategy. Evaluation of conSAMme's performance, click robustness, and mask ordering show substantial improvements to baselines with less than 1% extra training data compared to the amount of data used for the baseline.

Related Material


[pdf]
[bibtex]
@InProceedings{Myers-Dean_2025_CVPR, author = {Myers-Dean, Josh and Liu, Kangning and Price, Brian and Fan, Yifei and Kuen, Jason and Gurari, Danna}, title = {conSAMme: Achieving Consistent Segmentations with SAM}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {759-768} }