Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models

Lai, Zhengfeng; Bai, Haoping; Zhang, Haotian; Du, Xianzhi; Shan, Jiulong; Yang, Yinfei; Chuah, Chen-Nee; Cao, Meng

Zhengfeng Lai, Haoping Bai, Haotian Zhang, Xianzhi Du, Jiulong Shan, Yinfei Yang, Chen-Nee Chuah, Meng Cao; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 2691-2701

Abstract

Unsupervised Domain Adaptation (UDA) aims to leverage the labeled source domain to solve the tasks on the unlabeled target domain. Traditional UDA methods face the challenge of the tradeoff between domain alignment and semantic class discriminability, especially when a large domain gap exists between the source and target domain. The efforts of applying large-scale pre-training to bridge the domain gaps remain limited. In this work, we propose that Vision-Language Models (VLMs) can empower UDA tasks due to their training pattern with language alignment and their large-scale pre-trained datasets. For example, CLIP and GLIP have shown promising zero-shot generalization in classification and detection tasks. However, directly fine-tuning these VLMs into downstream tasks may be computationally expensive and not scalable if we have multiple domains that need to be adapted. Therefore, in this work, we first study an efficient adaption of VLMs to preserve the original knowledge while maximizing its flexibility for learning new knowledge. Then, we design a domain-aware pseudo-labeling scheme tailored to VLMs for domain disentanglement. We show the superiority of the proposed methods in four UDA-classification and two UDA-detection benchmarks, with a significant improvement (+9.9%) on DomainNet.

Related Material

[pdf]

[bibtex]

@InProceedings{Lai_2024_WACV, author = {Lai, Zhengfeng and Bai, Haoping and Zhang, Haotian and Du, Xianzhi and Shan, Jiulong and Yang, Yinfei and Chuah, Chen-Nee and Cao, Meng}, title = {Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {2691-2701} }