-
[pdf]
[bibtex]@InProceedings{Zhang_2026_CVPR, author = {Zhang, Junshu and Zhao, Sicheng and Zhao, Xin and Yang, Fan and Chen, Ruike and Han, Jungong and Ding, Guiguang}, title = {Spe-BEVHead: Rethinking the Detection Head Design for Bird's-Eye-View Object Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {25799-25809} }
Spe-BEVHead: Rethinking the Detection Head Design for Bird's-Eye-View Object Detection
Abstract
Bird's-Eye-View (BEV) detection has become a dominant paradigm for 3D object detection in autonomous driving, due to its strong perception capability. However, most existing methods mainly focus on constructing high-quality BEV feature representations, while neglecting the design of task-specific detection heads. In practice, they directly adopt the center-based head originally developed for 2D detection, without any specific optimization. This leads to three inherent limitations: (i) a geometric mismatch between the Gaussian kernel used for classification and the real BEV object, (ii) degraded end-to-end performance without Non-Maximum Suppression (NMS), and (iii) sparse supervisory signals. To address these issues, we propose Spe-BEVHead, a detection head specifically tailored for BEV 3D object detection. Spe-BEVHead introduces three BEV-specific adaptations: (1) a Rotated Box Kernel that generates geometry-aligned classification weights, (2) a Local Response Refinement Module (LRRM) that suppresses non-peak responses and improves end-to-end performance, and (3) a dual-branch architecture that provides richer supervisory signals to promote more robust learning while inherently preserving the performance for end-to-end inference. Extensive experiments show that Spe-BEVHead can be seamlessly integrated into existing BEV backbones, delivering direct performance gains while retaining competitive performance under the challenging end-to-end setting.
Related Material

