Extending Global-local View Alignment for Self-supervised Learning with Remote Sensing Imagery

Xinye Wanyan, Sachith Seneviratne, Shuchang Shen, Michael Kirley; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2443-2453

Abstract


Since large number of high-quality remote sensing images are readily accessible exploiting the corpus of images with less manual annotation draws increasing attention. Self-supervised models acquire general feature representations by formulating a pretext task that generates pseudo-labels for massive unlabeled data to provide supervision for training. While prior studies have explored multiple self-supervised learning techniques in remote sensing domain pretext tasks based on local-global view alignment remain underexplored despite achieving state-of-the-art results on natural imagery. Inspired by DINO [6] which employs an effective representation learning structure with knowledge distillation based on global-local view alignment we formulate two pretext tasks for self-supervised learning on remote sensing imagery (SSLRS). Using these tasks we explore the effectiveness of positive temporal contrast as well as multi-sized views on SSLRS. We extend DINO and propose DINO-MC which uses local views of various sized crops instead of a single fixed size in order to alleviate the limited variation in object size observed in remote sensing imagery. Our experiments demonstrate that even when pre-trained on only 10% of the dataset DINO-MC performs on par or better than existing state-of-the-art SSLRS methods on multiple remote sensing tasks while using less computational resources. All codes models and results are released at https://github.com/WennyXY/DINO-MC.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Wanyan_2024_CVPR, author = {Wanyan, Xinye and Seneviratne, Sachith and Shen, Shuchang and Kirley, Michael}, title = {Extending Global-local View Alignment for Self-supervised Learning with Remote Sensing Imagery}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2443-2453} }