-
[pdf]
[bibtex]@InProceedings{Linial_2025_WACV, author = {Linial, Ori and Leifman, George and Blau, Yochai and Sherman, Nadav and Gigi, Yotam and Sirko, Wojciech and Beryozkin, Genady}, title = {Enhancing Remote Sensing Representations Through Mixed-Modality Masked Autoencoding}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {507-516} }
Enhancing Remote Sensing Representations Through Mixed-Modality Masked Autoencoding
Abstract
This paper presents an innovative approach to pretraining models for remote sensing by integrating optical and SAR (Synthetic Aperture Radar) data from Sentinel-2 and Sentinel-1 satellites. Using a novel variation on the masked autoencoder (MAE) framework our model incorporates a dual-task setup: reconstructing masked Sentinel-2 images and predicting corresponding Sentinel-1 images. This multi-task design enables the encoder to capture both spectral and structural features across diverse environmental conditions. Additionally we introduce a "mixing" strategy in the pretraining phase combining patches from both image sources which mitigates spatial misalignment errors and enhances model robustness. Evaluation on segmentation and classification tasks including Sen1Floods11 BigEarthNet and UrbanSRSeg8 demonstrates significant improvements in adaptability and generalizability across varied downstream remote sensing applications. Our findings highlight the advantages of leveraging complementary modalities for more resilient and versatile land cover analysis.
Related Material