TFM^2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation

Yaoxin Zhuo, Zachary Bessinger, Lichen Wang, Naji Khosravan, Baoxin Li, Sing Bing Kang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 4693-4703

Abstract


The potential of Open-Vocabulary Semantic Segmentation (OVSS) in few-shot scenarios is not fully explored due to the complexity of extending few-shot concepts to semantic segmentation tasks. To address this challenge we propose Training-Free Mask Matching (TFM^2) an efficient mask-based adapter method that enhances OVSS models for the few-shot open vocabulary semantic segmentation task. TFM^2 is a key-value cache that explicitly designed for image masks. We introduce three modules to construct and refine the mask cache subsequently enhancing the OVSS mask classification performance. Comprehensive experiments demonstrate that TFM^2 improves the performance of state-of-the-art OVSS methods by a margin of 1% to 5% across different settings. Moreover TFM^2 is not limited to any specific methods or backbones. This work underscores the importance and potential of few-shot data in OVSS and presents a significant step toward leveraging this potential.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhuo_2025_WACV, author = {Zhuo, Yaoxin and Bessinger, Zachary and Wang, Lichen and Khosravan, Naji and Li, Baoxin and Kang, Sing Bing}, title = {TFM{\textasciicircum}2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {4693-4703} }