CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction

Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6243-6252

Abstract


Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for the goal of accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM, based on a scheme that privileges depth prediction in image locations where monocular SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa. We demonstrate the use of depth prediction to estimate the absolute scale of the reconstruction, hence overcoming one of the major limitations of monocular SLAM. Finally, we propose a framework to efficiently fuse semantic labels, obtained from a single frame, with dense SLAM, so to yield semantically coherent scene reconstruction from a single view. Evaluation results on two benchmark datasets show the robustness and accuracy of our approach.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Tateno_2017_CVPR,
author = {Tateno, Keisuke and Tombari, Federico and Laina, Iro and Navab, Nassir},
title = {CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}