Prompt Learning with One-Shot Setting based Feature Space Analysis in Vision-and-Language Models

Yuki Hirohashi, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7761-7770

Abstract


By using few-shot data and labels prompt learning obtains optimal prompts that are capable of achieving high performance on downstream tasks. Existing prompt learning methods generate high-quality prompts that are suitable for downstream tasks but tend to perform poorly in scenarios where only very limited data (e.g. one-shot) is available. We address on this challenging one-shot scenario and propose a novel architecture for prompt learning called Image-Text Feature Alignment Branch (ITFAB). ITFAB aligns text features closer to the centroids of image features and separates text features with different classes to resolve misalignment in the feature space thereby facilitating the acquisition of high-quality prompts with very limited data. In one-shot setting our method outperforms the existing CoOp and CoCoOp methods and in some cases even surpasses CoCoOp's 16-shot performance. Testing on different datasets and domain show that ITFAB almost matches CoCoOp's effectiveness. It also works with current prompt learning methods like MapLe and PromptSRC improving their performance in one-shot setting.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Hirohashi_2024_CVPR, author = {Hirohashi, Yuki and Hirakawa, Tsubasa and Yamashita, Takayoshi and Fujiyoshi, Hironobu}, title = {Prompt Learning with One-Shot Setting based Feature Space Analysis in Vision-and-Language Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7761-7770} }