Learning Occupancy for Monocular 3D Object Detection

Liang Peng, Junkai Xu, Haoran Cheng, Zheng Yang, Xiaopei Wu, Wei Qian, Wenxiao Wang, Boxi Wu, Deng Cai; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10281-10292

Abstract


Monocular 3D detection is a challenging task due to the lack of accurate 3D information. Existing approaches typically rely on geometry constraints and dense depth estimates to facilitate the learning but often fail to fully exploit the benefits of three-dimensional feature extraction in frustum and 3D space. In this paper we propose OccupancyM3D a method of learning occupancy for monocular 3D detection. It directly learns occupancy in frustum and 3D space leading to more discriminative and informative 3D features and representations. Specifically by using synchronized raw sparse LiDAR point clouds we define the space status and generate voxel-based occupancy labels. We formulate occupancy prediction as a simple classification problem and design associated occupancy losses. Resulting occupancy estimates are employed to enhance original frustum/3D features. As a result experiments on KITTI and Waymo open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Peng_2024_CVPR, author = {Peng, Liang and Xu, Junkai and Cheng, Haoran and Yang, Zheng and Wu, Xiaopei and Qian, Wei and Wang, Wenxiao and Wu, Boxi and Cai, Deng}, title = {Learning Occupancy for Monocular 3D Object Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {10281-10292} }