IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM

Minghao Yin, Shangzhe Wu, Kai Han; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10563-10573

Abstract


In this paper we address the challenging problem of visual SLAM with neural scene representations. Recently neural scene representations have shown promise for SLAM to produce dense 3D scene reconstruction with high quality. However existing methods require scene-specific optimization leading to time-consuming mapping processes for each individual scene. To overcome this limitation we propose IBD-SLAM an Image-Based Depth fusion framework for generalizable SLAM. In particular we adopt a Neural Radiance Field (NeRF) for scene representation. Inspired by multi-view image-based rendering instead of learning a fixed-grid scene representation we propose to learn an image-based depth fusion model that fuses depth maps of multiple reference views into a xyz-map representation. Once trained this model can be applied to new uncalibrated monocular RGBD videos of unseen scenes without the need for retraining and reconstructs full 3D scenes efficiently with a light-weight pose optimization procedure. We thoroughly evaluate IBD-SLAM on public visual SLAM benchmarks outperforming the previous state-of-the-art while being 10x faster in the mapping stage. Project page: https://visual-ai.github.io/ibd-slam.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Yin_2024_CVPR, author = {Yin, Minghao and Wu, Shangzhe and Han, Kai}, title = {IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {10563-10573} }