RS-SAM: Integrating Multi-Scale Information for Enhanced Remote Sensing Image Segmentation

Enkai Zhang, Jingjing Liu, Anda Cao, Zhen Sun, Haofei Zhang, Huiqiong Wang, Li Sun, Mingli Song; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 994-1010

Abstract


The introduction of the Segment Anything Model (SAM) provides a powerful pre-trained model for image segmentation tasks. However, its utilization in remote sensing image segmentation encounters notable challenges. First, SAM is primarily trained on large-scale natural images as a general visual model, which hinders its direct application to remote sensing field. Second, due to the diversity of spatial objects in remote sensing images, the naive columnar ViT structure of SAM leads to poor segmentation performance. Finally, SAM is designed primarily to distinguish between foreground and background, resulting in a simple structure that struggles with precise semantic segmentation. To address the above issues, we introduce RS-SAM, a prompt-free adaptation of SAM in the realm of remote sensing, with multi-scale ViT backbone. More specifically, we start by crafting an adapter for the SAM encoder to transferring SAM to the domain of remote sensing. Next, we addressed the encoder's limitations by integrating a Multi-scale Neck for capturing objects in different sizes. Finally, to enhance the segmentation results, we propose a Multi-scale Progressive Refinement Module to aggregate multi-scale and low-level features. Through experiments conducted on three public remote sensing datasets, our model outperforms the baseline by 0.8% to 6.2% on the Dice metric, which fully proves the effectiveness of our method.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2024_ACCV, author = {Zhang, Enkai and Liu, Jingjing and Cao, Anda and Sun, Zhen and Zhang, Haofei and Wang, Huiqiong and Sun, Li and Song, Mingli}, title = {RS-SAM: Integrating Multi-Scale Information for Enhanced Remote Sensing Image Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {994-1010} }