OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction

Ji Zhang, Yiran Ding, Zixin Liu; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 3587-3604

Abstract


3D occupancy prediction based on multi-sensor fusion, crucial for a reliable autonomous driving system, enables fine-grained understanding of 3D scenes. Previous fusion-based 3D occupancy predictions relied on depth estimation for processing 2D image features. However, depth estimation is an ill-posed problem, hindering the accuracy and robustness of these methods. Furthermore, fine-grained occupancy prediction demands extensive computational resources. To address these issues, we propose OccFusion, a depth estimation free multi-modal fusion framework. Additionally, we introduce a generalizable active training method and an active decoder that can be applied to any occupancy prediction model, with the potential to enhance their performance. Experiments conducted on nuScenes-Occupancy and nuScenes-Occ3D demonstrate our frameworks superior performance. Detailed ablation studies highlight the effectiveness of each proposed method.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Zhang_2024_ACCV, author = {Zhang, Ji and Ding, Yiran and Liu, Zixin}, title = {OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3587-3604} }