Learning Common and Specific Visual Prompts for Domain Generalization
Although fine-tuning a pre-trained large-scale model has become an effective method for domain generalization, domain shifts still issue a huge challenge for successfully transferring models to unseen test domains. In this paper, we study how to effectively adapt pre-trained vision Transformers for domain generalization problems in image classification. To this end, this paper proposes a novel Common-Specific Visual Prompt Tuning (CSVPT) method to transfer large-scale vision Transformer models to unknown test domains. Different from existing methods which learn fixed visual prompts for each task, CSVPT jointly learns domain-common prompts to capture the task context and sample-specific prompts to capture information about data distribution, which are generated for each sample through a trainable prompt-generating module (PGM). Combining the domain-common prompts and the sample-specific prompts, visual prompts learned by CSVPT are conditioned on each input sample rather than fixed once learned, which helps out-of-distribution generalization. Extensive experimental results show the effectiveness of CSVPT, and CSVPT with the backbone ViT-L/14 achieves state-of-the-art (SOTA) performance on five widely used benchmark datasets.