DaFF: Dual Attentive Feature Fusion for Multispectral Pedestrian Detection

Afnan Althoupety, Li-Yun Wang, Wu-Chi Feng, Banafsheh Rekabdar; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2997-3006


Inspired by how humans perceive and interpret the world using multiple senses multi-modal learning involves integrating information from multiple modalities to improve understanding and performance in various tasks. Aligning with that notion our key intuition is to utilize multi-model learning to solve the domain shift problem in nighttime pedestrian detection. In this paper we show that pairing RGB and infrared (IR) image features increases the robustness of pedestrian detection at night. Indeed this solution is unbiased towards a specific time of the day as the IR domain reduces the reliance on lighting and serves as complementary information to the RGB domain. Our work aims at exploiting the power of attention mechanisms to guide a multi-modal framework in feature fusing from RGB and IR modalities. Our novel fusion approach named dual attentive feature fusion (DaFF) leverages the duality of the transformer and channel-wise global attentions. To demonstrate the effectiveness of DaFF we conducted experiments on two real-world multispectral pedestrian datasets. Extensive experimental results reveal the superiority of DaFF. We believe that combining the complementary properties of RGB and IR modalities is an effective remedy to mitigate the domain shift problem in pedestrian detection.

Related Material

@InProceedings{Althoupety_2024_CVPR, author = {Althoupety, Afnan and Wang, Li-Yun and Feng, Wu-Chi and Rekabdar, Banafsheh}, title = {DaFF: Dual Attentive Feature Fusion for Multispectral Pedestrian Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2997-3006} }