Harmonizing Attention Fields with Knowledge Distillation for Multi-View 3D Object Detection

Yafei Qi, Menghao Yang, Fan Wu, Chen Wang, Yongmin Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 3768-3776

Abstract


Multi-view 3D object detection has achieved comparable accuracy to LiDAR-based methods in autonomous driving, benefiting from the powerful spatial modeling capability of transformer architectures. Knowledge distillation offers a promising solution for 3D detection by transferring knowledge from complex teacher models to lightweight student models. However, aligning knowledge between teacher and student models is challenging due to sparse attention fields and unordered outputs of 3D object queries. We propose a knowledge distillation framework that enhances the attention fields of 3D object queries through systematic alignment of global and local knowledge. Our method introduces a balanced distillation strategy that harmonizes diverse valuable regions across global features while effectively extracting the embedded "dark knowledge". Additionally, we reconstruct the attention fields of significant 3D object queries from a Bird's-Eye View perspective, facilitating local knowledge alignment. Comprehensive experiments conducted on state-of-the-art multi-view 3D object detection benchmarks validate the effectiveness of our approach. With a computational complexity of only 57.8 GFLOPs, which is 2 10 times lower than that of state-of-the-art models, our scheme achieves competitive performance with a 52.33 NDS score and yields significant improvements of 4.64 NDS and 4.99 mAP, compared to the baseline on the nuScenes dataset. The code is available at https://github.com/helloworld77/HarmonDistill.

Related Material


[pdf]
[bibtex]
@InProceedings{Qi_2025_CVPR, author = {Qi, Yafei and Yang, Menghao and Wu, Fan and Wang, Chen and Zhang, Yongmin}, title = {Harmonizing Attention Fields with Knowledge Distillation for Multi-View 3D Object Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {3768-3776} }