-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Lv_2026_CVPR, author = {Lv, Chengxin and Li, Yihui and Yang, Hongyu and Wang, YunHong}, title = {Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {14198-14207} }
Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction
Abstract
3D semantic occupancy prediction is crucial for autonomous driving, yet vision-only approaches suffer from weak geometric cues, and existing multi-modal frameworks often depend on dense voxel or BEV tensors that impose heavy computational cost. We present Gau-Occ, a multi-modal framework that models the scene as a compact collection of semantic 3D Gaussians, enabling geometry-guided fusion without dense volumetric processing. To enhance geometric completeness, a learned LiDAR Completion Diffuser (LCD) trained on real-world priors recovers missing structures from sparse LiDAR, and the completed points are encoded as semantic Gaussian anchors. To further integrate multi-view image semantics, we introduce Gaussian Anchor Fusion (GAF), a geometry-aligned aggregation module that performs anchor-guided 2D sampling, local neighborhood encoding, and cross-modal alignment. By constructing locally aggregated Gaussian descriptors that capture spatial consistency and semantic discriminability, GAF facilitates accurate feature association across modalities. Through anchor-driven refinement of Gaussian attributes, Occ-GS supports detailed 3D occupancy prediction. Extensive experiments across challenging benchmarks demonstrate that Occ-GS achieves state-of-the-art performance.
Related Material

