Distribution-Aligned Multimodal Fusion for Robust Object Detection

Hao, Xiaohui; Pu, Yanglin; Wang, Yongjun; She, Rui

Xiaohui Hao, Yanglin Pu, Yongjun Wang, Rui She; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 25494-25503

Abstract

Cross-degradation generalization remains a critical challenge for RGB-infrared multimodal object detection, especially when training data covers limited degradation types. This paper presents a distribution alignment framework with a key insight: aligning fused features to the pretrained distribution where the frozen detector performs optimally, rather than adapting to training-specific degradations. By freezing the pretrained detector and training only a lightweight fusion module, our approach leverages complementary infrared information to reduce distribution shift while maintaining computational efficiency. The method achieves state-of-the-art results on three benchmarks with 4x faster training. Critically, we demonstrate that aligning to the pretrained distribution substantially outperforms aligning to training degradations when generalizing to unseen scenarios.

Related Material

[pdf]

[bibtex]

@InProceedings{Hao_2026_CVPR, author = {Hao, Xiaohui and Pu, Yanglin and Wang, Yongjun and She, Rui}, title = {Distribution-Aligned Multimodal Fusion for Robust Object Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {25494-25503} }