Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Mainak Singha, Ankit Jha, Shirsha Bose, Ashwin Nair, Moloud Abdar, Biplab Banerjee; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 13309-13319

Abstract


We delve into Open Domain Generalization (ODG) marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls we introduce ODG-CLIP harnessing the semantic prowess of the vision-language model CLIP. Our framework brings forth three primary innovations: Firstly distinct from prevailing paradigms we conceptualize ODG as a multi-class classification challenge encompassing both known and novel categories. Central to our approach is modeling a unique prompt tailored for detecting unknown class samples and to train this we employ a readily accessible stable diffusion model elegantly generating proxy images for the open class. Secondly aiming for domain-tailored classification (prompt) weights while ensuring a balance of precision and simplicity we devise a novel visual style-centric prompt learning mechanism. Finally we infuse images with class-discriminative knowledge derived from the prompt space to augment the fidelity of CLIP's visual embeddings. We introduce a novel objective to safeguard the continuity of this infused semantic intel across domains especially for the shared classes. Through rigorous testing on diverse datasets covering closed and open-set DG contexts ODG-CLIP demonstrates clear supremacy consistently outpacing peers with performance boosts between 8%-16%. Code will be available at https://github.com/mainaksingha01/ODG-CLIP.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Singha_2024_CVPR, author = {Singha, Mainak and Jha, Ankit and Bose, Shirsha and Nair, Ashwin and Abdar, Moloud and Banerjee, Biplab}, title = {Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {13309-13319} }