Multimodal Dual-domain Learning for Image Fusion

Wang, Heng; Jin, Mingxin; Wang, Cong; Yuan, Yuan

Heng Wang, Mingxin Jin, Cong Wang, Yuan Yuan; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 6545-6554

Abstract

Multimodal image fusion aims to generate high-resolution hyperspectral images by leveraging the complementary characteristics of spatially and spectrally high-resolution data. However, most existing approaches focus solely on fusion in a spatial domain, while neglecting the potential of frequency-domain information. To address this limitation, this paper proposes a dual-domain learning network that effectively integrates multimodal information from both spatial and frequency domains. In order to explore spatial and frequency domain information, a core module is customized for image fusion, which called the dual-domain fusion module. It consists of two branches that are the spatial domain branch and the frequency domain branch. In the frequency domain branch, the phase and amplitude information of different modes are explored to achieve multi-modal information fusion in the frequency domain. The fusion of dual-domain information helps the model to mine richer context information and improve the detailed reasoning ability of the multi-modal fusion model. Experimental results on two public datasets show that the performance of the proposed network is better than those of other peers.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Wang_2025_ICCV, author = {Wang, Heng and Jin, Mingxin and Wang, Cong and Yuan, Yuan}, title = {Multimodal Dual-domain Learning for Image Fusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {6545-6554} }