Mamba-Adaptor: State Space Model Adaptor for Visual Recognition

Xie, Fei; Nie, Jiahao; Tang, Yujin; Zhang, Wenkang; Zhao, Hongshen

Fei Xie, Jiahao Nie, Yujin Tang, Wenkang Zhang, Hongshen Zhao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 20124-20134

Abstract

Recent State Space Models (SSM), especially Mamba, have demonstrated impressive performance in visual modeling and possess superior model efficiency. However, the application of Mamba to visual tasks suffers inferior performance due to three main constraints existing in the sequential model: 1) Casual computing is incapable of accessing global context; 2) Long-range forgetting occurs when computing the current hidden states; 3) Weak spatial structural modeling due to the transformed sequential input. To address these issues, we investigate a simple yet powerful vision task adapter for Mamba models, which consists of two functional modules: Adaptor-T and Adapator-S. When solving the hidden states for SSM, we apply a casual prediction module Adaptor-T to select a set of learnable locations as memory augmentation feature states to ease long-range forgetting issues. Moreover, we leverage Adapator-S, composed of multi-scale dilated convolutional kernels, to enhance the spatial modeling and introduce the image inductive bias into the feature output. Both modules can enlarge the context modeling in casual computing, as the output is enhanced by the inaccessible features. We explore three usages of Mamba-Adaptor: A general visual backbone for various vision tasks; A booster module to raise the performance of pretrained backbones; A highly efficient fine-tuning module that adapts the base model for transfer learning tasks. Extensive experiments verify the effectiveness of Mamba-Adapter in three settings. Notably, our Mamba-Adapter achieves state-of-the-art performance on the ImageNet and COCO benchmarks.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Xie_2025_CVPR, author = {Xie, Fei and Nie, Jiahao and Tang, Yujin and Zhang, Wenkang and Zhao, Hongshen}, title = {Mamba-Adaptor: State Space Model Adaptor for Visual Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {20124-20134} }