Let Me Show You How It's Done - Cross-modal Knowledge Distillation as Pretext Task for Semantic Segmentation

Rudhishna Narayanan Nair, Ronny Hänsch; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 595-603

Abstract


While Synthetic Aperture Radar (SAR) images have several advantages including robustness to weather conditions and independence from sunlight they are much harder to interpret by human annotators leading to less and smaller training datasets than for optical imagery. This is in particular true for tasks such as building footprint extraction where the side-looking nature of SAR complicates the perception of the object of interest. This work aims to leverage the availability of the large amount of labeled optical remote sensing images along with unlabeled paired PolSAR data for semantic segmentation of SAR images through cross-modal knowledge distillation. A network trained on optical images acts as a teacher model to train a student model by providing pseudo-labels for aligned images of both modalities. We test the proposed framework with multiple architectures and observe significantly increased performance after fine-tuning the student i.e. an increase of 5-20% IoU score compared to training a network based on SAR imagery from scratch.

Related Material


[pdf]
[bibtex]
@InProceedings{Nair_2024_CVPR, author = {Nair, Rudhishna Narayanan and H\"ansch, Ronny}, title = {Let Me Show You How It's Done - Cross-modal Knowledge Distillation as Pretext Task for Semantic Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {595-603} }