-
[pdf]
[bibtex]@InProceedings{Wang_2025_WACV, author = {Wang, Shuo and Xia, Chunlong and Lv, Feng and Shi, Yifeng}, title = {RT-DETRv3: Real-Time End-to-End Object Detection with Hierarchical Dense Positive Supervision}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {1628-1636} }
RT-DETRv3: Real-Time End-to-End Object Detection with Hierarchical Dense Positive Supervision
Abstract
RT-DETR is the first real-time end-to-end transformer-based object detector. Its efficiency comes from the framework design and the Hungarian matching. However compared to dense supervision detectors like the YOLO series the Hungarian matching provides much sparser supervision leading to insufficient model training and difficult to achieve optimal results. To address these issues we proposed a hierarchical dense positive supervision method based on RT-DETR named RT-DETRv3. Firstly we introduce a CNN-based auxiliary branch that provides dense supervision that collaborates with the original decoder to enhance the encoder's feature representation. Secondly to address insufficient decoder training we propose a novel learning strategy involving self-attention perturbation. This strategy diversifies label assignment for positive samples across multiple query groups thereby enriching positive supervisions. Additionally we introduce a shared-weight decoder branch for dense positive supervision to ensure more high-quality queries matching each ground truth. Notably all aforementioned modules are training-only. We conduct extensive experiments to demonstrate the effectiveness of our approach on COCO val2017. RT-DETRv3 significantly outperforms existing real-time detectors including the RT-DETR series and the YOLO series. For example RT-DETRv3-R18 achieves 48.1% AP (+1.6%/+1.4%) compared to RT-DETR-R18/RT-DETRv2-R18 while maintaining the same latency. Furthermore RT-DETRv3-R101 can attain an impressive 54.6% AP outperforming YOLOv10-X. The code will be released at https://github.com/clxia12/RT-DETRv3.
Related Material