Adversarially Domain-adaptive Latent Diffusion for Unsupervised Semantic Segmentation

Jongmin Yu, Zhongtian Sun, Chi Bene Chen, Jinhong Yang, Shan Luo; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 2614-2624

Abstract


Semantic segmentation requires extensive pixel-level annotation, motivating unsupervised domain adaptation (UDA) to transfer knowledge from labelled source domains to unlabeled or weakly labelled target domains. One of the most efficient ways is to use synthetic datasets captured from some controlled virtual environments, such as video games or traffic simulators, which can assign pixel-level annotation automatically. However, even though we have those datasets, it is still very challenging to find a well-generalised representation which can describe two domains due to probabilistic or geometric differences between the virtual world and real-world images. In this work, we introduce a latent diffusion model-based semantic segmentation method called Inter-Coder Connected Latent Diffusion (ICCLD) and an unsupervised domain adaptation approach. The model employs an inter-coder connection to enhance contextual understanding and preserve fine details, while adversarial learning aligns latent feature distributions across domains during the diffusion process on latent features. Experiments on GTA5, Synthia, and Cityscapes demonstrate that ICCLD outperforms state-of-the-art (SOTA) UDA methods, achieving mIoU scores of 74.4 (GTA5 > Cityscapes) and 67.2 (Synthia > Cityscapes).

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Yu_2025_CVPR, author = {Yu, Jongmin and Sun, Zhongtian and Chen, Chi Bene and Yang, Jinhong and Luo, Shan}, title = {Adversarially Domain-adaptive Latent Diffusion for Unsupervised Semantic Segmentation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {2614-2624} }