MambaML: Exploring State Space Models for Multi-Label Image Classification

Xuelin Zhu, Jian Liu, Jiuxin Cao, Bing Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 4743-4753

Abstract


Mamba, a selective state-space model, has recently seen widespread application across various visual tasks due to its exceptional ability to capture long-range dependencies. While promising results have been demonstrated in image classification, its potential in multi-label image classification remains underexplored. To bridge this gap, we propose a novel Mamba-based decoder, which utilizes the intrinsic attention of Mamba to aggregate visual information from image features into label embeddings, yielding label-specific visual representations. Building upon this, a MambaML framework is developed for multi-label image classification, which models the self-correlations of image features and label embeddings with bi-directional Mamba, as well as their cross-correlations with the Mamba-based decoder, allowing visual spatial relationships, label semantic dependencies, and cross-modal associations to be explored in a unified system. In this way, robust label-specific visual representations are acquired, facilitating the training of binary classifiers towards accurate label recognition. Experiments on public benchmarks suggest that our MambaML achieves performance comparable to state-of-the-art methods in multi-label image classification, while requiring fewer parameters and computational overhead.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhu_2025_ICCV, author = {Zhu, Xuelin and Liu, Jian and Cao, Jiuxin and Wang, Bing}, title = {MambaML: Exploring State Space Models for Multi-Label Image Classification}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {4743-4753} }