-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Zhu_2023_ICCV, author = {Zhu, Beier and Niu, Yulei and Han, Yucheng and Wu, Yue and Zhang, Hanwang}, title = {Prompt-aligned Gradient for Prompt Tuning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {15659-15669} }
Prompt-aligned Gradient for Prompt Tuning
Abstract
Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by discrete prompt design, e.g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity between the image and the prompt sentence "a photo of a [CLASS]". Furthermore, prompting shows great potential for fast adaptation of VLMs to downstream tasks if we fine-tune the soft prompts with few samples. However, we find a common failure that improper fine-tuning or learning with extremely few-shot samples may even under-perform the zero-shot prediction. Existing methods still address this problem by using traditional anti-overfitting techniques such as early stopping and data augmentation, which lack a principled solution specific to prompting. In this paper, we present Prompt-aligned Gradient, dubbed ProGrad to prevent prompt tuning from forgetting the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the general knowledge, which is represented as the optimization direction offered by the predefined prompt predictions. Extensive experiments under the few-shot learning, domain generalization, base-to-new generalization and cross-dataset transfer settings demonstrate
the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods. Codes and theoretical proof are in Appendix.
Related Material