CrossMatch: Source-Free Domain Adaptive Semantic Segmentation via Cross-Modal Consistency Training

Yifang Yin, Wenmiao Hu, Zhenguang Liu, Guanfeng Wang, Shili Xiang, Roger Zimmermann; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 21786-21796

Abstract


Source-free domain adaptive semantic segmentation has gained increasing attention recently. It eases the requirement of full data access to the source domain by transferring knowledge only from a well-trained source model. However, reducing the uncertainty of the target pseudo labels becomes inevitably more challenging without the supervision of the labeled source data. In this work, we propose a novel asymmetric two-stream architecture that learns more robustly from noisy pseudo labels. Our approach simultaneously conducts dual-head pseudo label denoising and cross-modal consistency regularization. Towards the former, we introduce a multimodal auxiliary network during training (and discard it during inference), which effectively enhances the pseudo labels' correctness by leveraging the guidance from the depth information. Towards the latter, we enforce a new cross-modal pixel-wise consistency between the predictions of the two streams, encouraging our model to behave smoothly for both modality variance and image perturbations. It serves as an effective regularization to further reduce the impact of the inaccurate pseudo labels in source-free unsupervised domain adaptation. Experiments on GTA5 to Cityscapes and SYNTHIA to Cityscapes benchmarks demonstrate the superiority of our proposed method, obtaining the new state-of-the-art mIoU of 57.7% and 57.5%, respectively.

Related Material


[pdf]
[bibtex]
@InProceedings{Yin_2023_ICCV, author = {Yin, Yifang and Hu, Wenmiao and Liu, Zhenguang and Wang, Guanfeng and Xiang, Shili and Zimmermann, Roger}, title = {CrossMatch: Source-Free Domain Adaptive Semantic Segmentation via Cross-Modal Consistency Training}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {21786-21796} }