Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models

Talemi, Niloufar Alipour; Kashiani, Hossein; Afghah, Fatemeh

Niloufar Alipour Talemi, Hossein Kashiani, Fatemeh Afghah; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 6207-6216

Abstract

Pre-trained Vision-language (VL) models such as CLIP have shown significant generalization ability to downstream tasks even with minimal fine-tuning. While prompt learning has emerged as an effective strategy to adapt pre-trained VL models for downstream tasks current approaches frequently encounter severe overfitting to specific downstream data distributions. This overfitting constrains the original behavior of the VL models to generalize to new domains or unseen classes posing a critical challenge in enhancing the adaptability and generalization of VL models. To address this limitation we propose Style-Pro a novel style-guided prompt learning framework that mitigates overfitting and preserves the zero-shot generalization capabilities of CLIP. Style-Pro employs learnable style bases to synthesize diverse distribution shifts guided by two specialized loss functions that ensure style diversity and content integrity. Then to minimize discrepancies between unseen domains and the source domain Style-Pro maps the unseen styles into the known style representation space as a weighted combination of style bases. Moreover to maintain consistency between the style-shifted prompted model and the original frozen CLIP Style-Pro introduces consistency constraints to preserve alignment in the learned embeddings minimizing deviation during adaptation to downstream tasks. Extensive experiments across 11 benchmark datasets demonstrate the effectiveness of Style-Pro consistently surpassing state-of-the-art methods in various settings including base-to-new generalization cross-dataset transfer and domain generalization.

Related Material

[pdf]

[bibtex]

@InProceedings{Talemi_2025_WACV, author = {Talemi, Niloufar Alipour and Kashiani, Hossein and Afghah, Fatemeh}, title = {Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {6207-6216} }