A Universal Structure of YOLO Series Small Object Detection Models

Shengchao Hu, Xiao Liu, Weijun Wang, Tianlun Huang, Wei Feng; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 3706-3722

Abstract


The YOLO series detection models play a crucial role in target detection tasks. However, these models are typically trained on datasets with standard angles. For datasets like Visdron2021 and Tinyperson, there are challenges related to small, dense, and numerous objects that conventional object detection models struggle to detect effectively. Therefore, we propose a universal structure for all YOLO series models to enhance their capability to detect small objects. We first use a large-scale feature map as a new detection branch to address the issue of feature loss with small objects. Secondly, we have developed a detail-guide-block (DGB) to enhance the model's ability in detailed detection, along with a feature-refine-module (FRM) aimed at mitigating the problem of feature flattening caused by upsampling. Finally, we removed the fourth detection branch that did not significantly improve detection accuracy, which can to some extent improve the execution speed of the model and reduce its complexity. We have ported our structure on YOLOX, YOLOv7, and YOLOv8, and conducted extensive experiments on Visdrone2021 and Tinypeson datasets. The experimental data demonstrate that our improved models consistently outperform the original model in terms of performance.

Related Material


[pdf]
[bibtex]
@InProceedings{Hu_2024_ACCV, author = {Hu, Shengchao and Liu, Xiao and Wang, Weijun and Huang, Tianlun and Feng, Wei}, title = {A Universal Structure of YOLO Series Small Object Detection Models}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3706-3722} }