T-SAM: Transductive Learning for Segment Anything Model

Rangel Daroya, Deepak Chandran, Subhransu Maji, Andrea Fanelli; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 6419-6428

Abstract


The Segment Anything Model has been used for widespread applications by using user prompts to segment objects at test time. While this could be effective for segmenting images with a few samples, it becomes prohibitive when either labeling a lot of images or labeling images with multiple instances (e.g., segmenting a large group of cells or a migrating flock of birds). We introduce transductive learning for SAM (T-SAM) which uses a few labeled examples to learn a prompt representation that can generalize to other test images, including out-of-domain classes. T-SAM uses a cross-attention mechanism to learn distinguishing features from the labeled images to properly segment similar instances of the same object in other images. We also show an alternative method that uses the average representation based on previous labeled images. While applying model fine-tuning or utilizing memory banks have been introduced by previous works, the large number of parameters involved and storage constraints limit their applicability. We show our method performs competitively with baseline methods, and is able to scale with an increasing number of labeled images.

Related Material


[pdf]
[bibtex]
@InProceedings{Daroya_2025_CVPR, author = {Daroya, Rangel and Chandran, Deepak and Maji, Subhransu and Fanelli, Andrea}, title = {T-SAM: Transductive Learning for Segment Anything Model}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {6419-6428} }