Super Sparse DETR: YOLO-Competitive Convergence and Acceleration

Zhu, Hebao

Hebao Zhu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 6677-6684

Abstract

We propose Super Sparse DETR, a deployment-oriented structured sparsity training framework for DETR that brings its efficiency closer to YOLO-style detectors. Specifically, our method leverages a decoder-derived attention aggregation map (DAM) and channel sensitivity (CS) to perform stage-wise structural sparsity on key components. Combined with skip-update of non-critical parameters during training and a gather operation at export time, these learned selections translate into deterministic reductions in sequence length and real inference speedups. To mitigate instability and information loss introduced by sparsity, we further apply consistency distillation from an EMA teacher to the pruned model. Extensive experiments on COCO and industrial defect datasets demonstrate that Super Sparse DETR achieves substantial acceleration while maintaining accuracy, laying a foundation for DETR to catch up with YOLO in industrial scenarios.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zhu_2026_CVPR, author = {Zhu, Hebao}, title = {Super Sparse DETR: YOLO-Competitive Convergence and Acceleration}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, month = {June}, year = {2026}, pages = {6677-6684} }