OccFlowNet: Occupancy Estimation via Differentiable Rendering and Occupancy Flow

Simon Boeder, Benjamin Risse; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 306-316

Abstract


Semantic occupancy has recently gained significant traction as a prominent 3D scene representation. However most existing camera-based methods rely on large and costly datasets with fine-grained 3D voxel labels for training which limits their practicality and scalability. Furthermore approaches in this domain lack the modelling of scene dynamics. In this work we present a novel approach to occupancy estimation inspired by neural radiance field (NeRF) using supervision in 2D based on 3D labels provided by LiDAR that offers a more natural way of supervision than voxel labels. In particular we employ differentiable volumetric rendering to predict depth and semantic maps and train a 3D network based on supervision in 2D space only. To enhance geometric accuracy and increase the supervisory signal we introduce temporal rendering of adjacent time steps. Additionally we introduce occupancy flow as a mechanism to handle dynamic objects in the scene and ensure their temporal consistency. Through extensive experimentation we demonstrate that supervision in 2D with LiDAR can achieve state-of-the-art performance compared to methods using voxel labels and when combining it with voxel supervision in 3D temporal rendering and occupancy flow we outperform all previous occupancy estimation models significantly. We conclude that the proposed rendering supervision and occupancy flow advances occupancy estimation.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Boeder_2025_WACV, author = {Boeder, Simon and Risse, Benjamin}, title = {OccFlowNet: Occupancy Estimation via Differentiable Rendering and Occupancy Flow}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {306-316} }