3D Scene Mesh From CNN Depth Predictions and Sparse Monocular SLAM

Tomoyuki Mukasa, Jiu Xu, Bjorn Stenger; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 921-928

Abstract


In this paper, we propose a novel framework for integrating geometrical measurements of monocular visual simultaneous localization and mapping (SLAM) and depth prediction using a convolutional neural network (CNN). In our framework, SLAM-measured sparse features and CNN- predicted dense depth maps are fused to obtain a more accurate dense 3D reconstruction including scale. We continuously update an initial 3D mesh by integrating accurately tracked sparse features points. Compared to prior work on integrating SLAM and CNN estimates [20], there are two main differences: Using a 3D mesh representation allows as-rigid-as-possible update transformations. We further propose a system architecture suitable for mobile devices, where feature tracking and CNN-based depth prediction modules are separated, and only the former is run on the device. We evaluate the framework by comparing the 3D reconstruction result with 3D measurements obtained using an RGBD sensor, showing a reduction in the mean residual error of 38% compared to CNN-based depth map prediction alone.

Related Material


[pdf]
[bibtex]
@InProceedings{Mukasa_2017_ICCV,
author = {Mukasa, Tomoyuki and Xu, Jiu and Stenger, Bjorn},
title = {3D Scene Mesh From CNN Depth Predictions and Sparse Monocular SLAM},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}