Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization

Yao, Hantao; Zhang, Rui; Xu, Changsheng

Hantao Yao, Rui Zhang, Changsheng Xu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 6757-6767

Abstract

Prompt tuning is an effective way to adapt the pretrained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based works combine the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge has worse generalizable to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. To remember the essential general knowledge, KgCoOp constructs a regularization term to ensure that the essential general textual knowledge can be embedded into the special textual knowledge generated by the learnable prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, i.e., achieves better performance with less training time.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Yao_2023_CVPR, author = {Yao, Hantao and Zhang, Rui and Xu, Changsheng}, title = {Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {6757-6767} }