Training-Free Few-Shot Segmentation via Vision-Language Guided Prompting

Yoon, Euihyun; Park, Taejin; Lee, Jaekoo

Euihyun Yoon, Taejin Park, Jaekoo Lee; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026, pp. 6517-6526

Abstract

Object segmentation relies heavily on costly pixel-level annotations and struggles to generalize to unseen domains. The recent introduction of the Segment Anything Model (SAM), a foundation model for segmentation, offers a prompt-driven, zero-shot capability that has been applied in various domains (e.g., autonomous driving, satellite imagery, medical imaging) and extended to Few-Shot Segmentation (FSS) tasks. However, existing SAM-based FSS methods typically generate prompts by using a vision encoder to measure support-query image similarity, which often biases towards the support images and fails when there are significant support-query context shifts. To address this limitation, we propose a training-free FSS approach that combines visual and textual cues to generate effective prompts for the target class. By leveraging both vision and language information, our approach bridges the support-query gap and guides SAM to segment novel objects more reliably. Without any additional training, our method outperforms previous state-of-the-art FSS methods on established benchmarks (COCO\text - 20^i, Pascal\text - 5^i), demonstrating its effectiveness and robust generalization. Our code is publicly available on GitHub.

Related Material

[pdf]

[bibtex]

@InProceedings{Yoon_2026_WACV, author = {Yoon, Euihyun and Park, Taejin and Lee, Jaekoo}, title = {Training-Free Few-Shot Segmentation via Vision-Language Guided Prompting}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {6517-6526} }