AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

Gahyeon Kim, Sohee Kim, Seokju Lee; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1572-1582

Abstract


Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this recent studies such as CoOp and CoCoOp have proposed the use of prompt learning where context within a prompt is replaced with learnable vectors leading to significant improvements over manually crafted prompts. However the performance improvement for unseen classes is still marginal and to tackle this problem data augmentation has been frequently used in traditional zero-shot learning techniques. Through our experiments we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes negatively impacting generalization to unseen classes. To address this problem we propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts. Through our novel mechanism called "Adding Attributes to Prompt Learning" AAPL we guide the learnable context to effectively extract text features by focusing on high-level features for unseen classes. We have conducted experiments across 11 datasets and overall AAPL shows favorable performances compared to the existing methods in few-shot learning zero-shot learning cross-dataset and domain generalization tasks.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Kim_2024_CVPR, author = {Kim, Gahyeon and Kim, Sohee and Lee, Seokju}, title = {AAPL: Adding Attributes to Prompt Learning for Vision-Language Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1572-1582} }