CLIP the Gap: A Single Domain Generalization Approach for Object Detection

Vidit, Vidit; Engilberge, Martin; Salzmann, Mathieu

Vidit Vidit, Martin Engilberge, Mathieu Salzmann; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 3219-3229

Abstract

Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD[49], on their own diverse weather-driving benchmark.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Vidit_2023_CVPR, author = {Vidit, Vidit and Engilberge, Martin and Salzmann, Mathieu}, title = {CLIP the Gap: A Single Domain Generalization Approach for Object Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {3219-3229} }