CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

Lingjun Zhao, Jingyu Song, Katherine A. Skinner; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15470-15480

Abstract


In the field of 3D object detection for autonomous driving LiDAR-Camera (LC) fusion is the top-performing sensor configuration. Still LiDAR is relatively high cost which hinders adoption of this technology for consumer automobiles. Alternatively camera and radar are commonly deployed on vehicles already on the road today but performance of Camera-Radar (CR) fusion falls behind LC fusion. In this work we propose Camera-Radar Knowledge Distillation (CRKD) to bridge the performance gap between LC and CR detectors with a novel cross-modality KD framework. We use the Bird's-Eye-View (BEV) representation as the shared feature space to enable effective knowledge distillation. To accommodate the unique cross-modality KD path we propose four distillation losses to help the student learn crucial features from the teacher model. We present extensive evaluations on the nuScenes dataset to demonstrate the effectiveness of the proposed CRKD framework. The project page for CRKD is https://song-jingyu.github.io/CRKD.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhao_2024_CVPR, author = {Zhao, Lingjun and Song, Jingyu and Skinner, Katherine A.}, title = {CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {15470-15480} }