-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Huang_2024_CVPR, author = {Huang, Yuanhui and Zheng, Wenzhao and Zhang, Borui and Zhou, Jie and Lu, Jiwen}, title = {SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {19946-19956} }
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Abstract
3D occupancy prediction is an important task for the robustness of vision-centric autonomous driving which aims to predict whether each point is occupied in the surrounding 3D space. Existing methods usually require 3D occupancy labels to produce meaningful results. However it is very laborious to annotate the occupancy status of each voxel. In this paper we propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences. We first transform the images into the 3D space (e.g. bird's eye view) to obtain 3D representation of the scene. We directly impose constraints on the 3D representations by treating them as signed distance fields. We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations. We propose an MVS-embedded strategy to directly optimize the SDF-induced weights with multiple depth proposals. Our SelfOcc outperforms the previous best method SceneRF by 58.7% using a single frame as input on SemanticKITTI and is the first self-supervised work that produces reasonable 3D occupancy for surround cameras on nuScenes. SelfOcc produces high-quality depth and achieves state-of-the-art results on novel depth synthesis monocular depth estimation and surround-view depth estimation on the SemanticKITTI KITTI-2015 and nuScenes respectively. Code: https://github.com/huang-yh/SelfOcc.
Related Material