Learning Common and Specific Visual Prompts for Domain Generalization

Aodi Li, Liansheng Zhuang, Shuo Fan, Shafei Wang; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 4260-4275


Although fine-tuning a pre-trained large-scale model has become an effective method for domain generalization, domain shifts still issue a huge challenge for successfully transferring models to unseen test domains. In this paper, we study how to effectively adapt pre-trained vision Transformers for domain generalization problems in image classification. To this end, this paper proposes a novel Common-Specific Visual Prompt Tuning (CSVPT) method to transfer large-scale vision Transformer models to unknown test domains. Different from existing methods which learn fixed visual prompts for each task, CSVPT jointly learns domain-common prompts to capture the task context and sample-specific prompts to capture information about data distribution, which are generated for each sample through a trainable prompt-generating module (PGM). Combining the domain-common prompts and the sample-specific prompts, visual prompts learned by CSVPT are conditioned on each input sample rather than fixed once learned, which helps out-of-distribution generalization. Extensive experimental results show the effectiveness of CSVPT, and CSVPT with the backbone ViT-L/14 achieves state-of-the-art (SOTA) performance on five widely used benchmark datasets.

Related Material

@InProceedings{Li_2022_ACCV, author = {Li, Aodi and Zhuang, Liansheng and Fan, Shuo and Wang, Shafei}, title = {Learning Common and Specific Visual Prompts for Domain Generalization}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {4260-4275} }