Collaborating Foundation Models for Domain Generalized Semantic Segmentation

Benigmim, Yasser; Roy, Subhankar; Essid, Slim; Kalogeiton, Vicky; Lathuilière, Stéphane

Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 3108-3119

Abstract

Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference. Existing DGSS methods typically effectuate robust features by means of Domain Randomization (DR). Such an approach is often limited as it can only account for style diversification and not content. In this work we take an orthogonal approach to DGSS and propose to use an assembly of CoLlaborative FOUndation models for Domain Generalized Semantic Segmentation (CLOUDS). In detail CLOUDS is a framework that integrates Foundation Models of various kinds: (i) CLIP backbone for its robust feature representation (ii) Diffusion Model to diversify the content thereby covering various modes of the possible target distribution and (iii) Segment Anything Model (SAM) for iteratively refining the predictions of the segmentation model. Extensive experiments show that our CLOUDS excels in adapting from synthetic to real DGSS benchmarks and under varying weather conditions notably outperforming prior methods by 5.6% and 6.7% on averaged mIoU respectively. Our code is available at https://github.com/yasserben/CLOUDS

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Benigmim_2024_CVPR, author = {Benigmim, Yasser and Roy, Subhankar and Essid, Slim and Kalogeiton, Vicky and Lathuili\`ere, St\'ephane}, title = {Collaborating Foundation Models for Domain Generalized Semantic Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {3108-3119} }