ESM-YOLO: Enhanced Small Target Detection Based on Visible and Infrared Multi-modal Fusion

Qianqian Zhang, Linwei Qiu, Li Zhou, Junshe An; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 1454-1469

Abstract


Detecting small targets in remote sensing imagery is frequently impeded by target faintness and complex background, resulting in reduced accuracy. This work introduces an Enhanced Small Target Detection method, termed ESM-YOLO, which leverages multi-modal fusion of visible and infrared data to enhance inter-modality correlation and thereby augments performance. Firstly, we devise a pixel-level Bilateral Excitation Fusion (BEF) module to extract both shared and unique features from distinct modalities symmetrically and efficiently. Subsequently, an Improved Atrous Spatial Pyramid Pooling (IASPP) unit and a Compact BottleneckCSP (CBCSP) unit are incorporated into the detection architecture. These components are meticulously tailored to enhance the detection of minute object features, while ensuring a balance between computational efficiency and feature representation capability. Experimental results show that ESM-YOLO achieves 82.42% accuracy on the widely used Vehicle Detection in Aerial Imagery (VEDAI) dataset. The effectiveness and superiority of our proposed method are demonstrated through extensive experiments.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2024_ACCV, author = {Zhang, Qianqian and Qiu, Linwei and Zhou, Li and An, Junshe}, title = {ESM-YOLO: Enhanced Small Target Detection Based on Visible and Infrared Multi-modal Fusion}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {1454-1469} }