Event-based Video Super-Resolution via State Space Models

Xiao, Zeyu; Wang, Xinchao

Zeyu Xiao, Xinchao Wang; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 12564-12574

Abstract

Exploiting temporal correlations is crucial for video super-resolution (VSR). Recent approaches enhance this by incorporating event cameras. In this paper, we introduce MamEVSR, a Mamba-based network for event-based VSR that leverages the selective state space model, Mamba. MamEVSR stands out by offering global receptive field coverage with linear computational complexity, thus addressing the limitations of convolutional neural networks and Transformers. The key components of MamEVSR include: (1) The interleaved Mamba (iMamba) block, which interleaves tokens from adjacent frames and applies multidirectional selective state space modeling, enabling efficient feature fusion and propagation across bi-directional frames while maintaining linear complexity. (2) The crossmodality Mamba (cMamba) block facilitates further interaction and aggregation between event information and the output from the iMamba block. The cMamba block can leverage complementary spatio-temporal information from both modalities and allows MamEVSR to capture finermotion details. Experimental results show that the proposed MamEVSR achieves superior performance on various datasets quantitatively and qualitatively.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Xiao_2025_CVPR, author = {Xiao, Zeyu and Wang, Xinchao}, title = {Event-based Video Super-Resolution via State Space Models}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {12564-12574} }