Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection

Zhuo Xu, Xiang Xiang, Yifan Liang; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 15402-15412

Abstract


Vision-language models (VLMs), such as CLIP, have shown remarkable capabilities in downstream tasks. However, the coupling of semantic information between the foreground and the background in images leads to significant shortcut issues that adversely affect out-of-distribution (OOD) detection abilities. When confronted with a background OOD sample, VLMs are prone to misidentifying it as in-distribution (ID) data. In this paper, we analyze the OOD problem from the perspective of shortcuts in VLMs and propose OSPCoOp which includes background decoupling and mask-guided region regularization. We first decouple images into ID-relevant and ID-irrelevant regions and utilize the latter to generate a large number of augmented OOD background samples as pseudo-OOD supervision. We then use the masks from background decoupling to adjust the model's attention, minimizing its focus on ID-irrelevant regions. To assess the model's robustness against background interference, we introduce a new OOD evaluation dataset, ImageNet-Bg, which solely consists of background images with all ID-relevant regions removed. Our method demonstrates exceptional performance in few-shot scenarios, achieving strong results even in one-shot setting, and outperforms existing methods. The code and proposed ImageNet-Bg are available at https://github.com/HAIV-Lab/OSPCoOp_Imagenet-bg.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Xu_2025_CVPR, author = {Xu, Zhuo and Xiang, Xiang and Liang, Yifan}, title = {Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {15402-15412} }