-
[pdf]
[bibtex]@InProceedings{Jiang_2025_WACV, author = {Jiang, Haoyu and Cheng, Zhi-Qi and Moreira, Gabriel and Zhu, Jiawen and Sun, Jingdong and Ren, Bukun and He, Jun-Yan and Dai, Qi and Hua, Xian-Sheng}, title = {UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {5429-5438} }
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
Abstract
Universal Cross-Domain Retrieval (UCDR) retrieves relevant images from unseen domains and classes without semantic labels ensuring robust generalization. Existing methods commonly employ prompt tuning with pre-trained vision-language models but are inherently limited by static prompts reducing adaptability. We propose UCDR-Adapter which enhances pre-trained models with adapters and dynamic prompt generation through a two-phase training strategy. First Source Adapter Learning integrates class semantics with domain-specific visual knowledge using a Learnable Textual Semantic Template and optimizes Class and Domain Prompts via momentum updates and dual loss functions for robust alignment. Second Target Prompt Generation creates dynamic prompts by attending to masked source prompts enabling seamless adaptation to unseen domains and classes. Unlike prior approaches UCDR-Adapter dynamically adapts to evolving data distributions enhancing both flexibility and generalization. During inference only the image branch and generated prompts are used eliminating reliance on textual inputs for highly efficient retrieval. Extensive benchmark experiments show that UCDR-Adapter consistently outperforms ProS in most cases and other state-of-the-art methods on UCDR U^cCDR and U^dCDR settings.
Related Material