Multimodal Object Detection by Channel Switching and Spatial Attention

Yue Cao, Junchi Bin, Jozsef Hamari, Erik Blasch, Zheng Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 403-411

Abstract


Multimodal object detection has attracted great attention in recent years since the information specific to different modalities can complement each other and effectively improve the accuracy and stability of the detection model. However, compared to processing the inputs from a single modality, fusing information from multiple modalities can significantly increase the computational complexity of the model, thus impairing its efficiency. Therefore the multimodal fusion module needs to be carefully designed to enhance the performance of the detection model while keeping the computational consumption low. In this paper, we propose a novel lightweight fusion module that can efficiently fuse the inputs from different modalities using channel switching and spatial attention (CSSA). The effectiveness and generalizability of the module are tested using two public multimodal datasets LLVIP and FLIR, both of which comprise paired infrared (IR) and visible (RGB) images. The experiments demonstrate that the proposed CSSA module can substantially improve the accuracy of multimodal object detection without consuming excessive computing resources.

Related Material


[pdf]
[bibtex]
@InProceedings{Cao_2023_CVPR, author = {Cao, Yue and Bin, Junchi and Hamari, Jozsef and Blasch, Erik and Liu, Zheng}, title = {Multimodal Object Detection by Channel Switching and Spatial Attention}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {403-411} }