-
[pdf]
[arXiv]
[bibtex]@InProceedings{Kerdreux_2025_CVPR, author = {Kerdreux, Thomas and Tuel, Alexandre and Febvre, Quentin and Mouche, Alexis and Chapron, Bertrand}, title = {Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {3017-3027} }
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation
Abstract
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO), demonstrating strong transferability across diverse remote sensing tasks. While much research has focused on network architectures and training strategies, the role of dataset curation -- particularly in balancing and diversifying pre-training datasets -- remains underexplored. In EO, this challenge is exacerbated by the strong redundancy and heavy-tailed distributions of satellite imagery, which can lead to biased representations and inefficient training. In this work, we introduce a dynamic dataset pruning strategy designed to enhance SSL pre-training efficiency by maximizing dataset diversity and balancedness. Our method iteratively refines the training set without relying on a pre-existing feature extractor, making it well-suited for domains where curated datasets are unavailable. We illustrate our approach on the Sentinel-1 Wave Mode (WV) Synthetic Aperture Radar (SAR) archive, a challenging dataset primarily composed of ocean observations. We train models from scratch on the entire Sentinel-1 WV data archive over 10 years. Our results, validated across three downstream tasks, show that dynamic pruning improves both computational efficiency and feature quality, leading to better transferability in real-world applications. This work provides a scalable and adaptable solution for dataset curation in EO, paving the way for more efficient and generalizable foundation models in remote sensing. We release the weights of Nereus-SAR-1, the first foundation model in our Nereus models family -- series of models dedicated to ocean observation and analysis using SAR imagery, at github.com/galeio-research/nereus-sar-models/.
Related Material