DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Dongki Jung, Jaehoon Choi, Yonghan Lee, Deokhwa Kim, Changick Kim, Dinesh Manocha, Donghwan Lee; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 12797-12807

Abstract


We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from crowded indoor environments, we design our training framework without requiring groundtruth depths produced from depth sensing devices. Our network leverages RGB images and sparse depth maps generated from traditional 3D reconstruction methods to estimate dense depth maps. We use two constraints to handle depth for non-rigidly moving people without tracking their motion explicitly. We demonstrate that our approach offers consistent improvements over recent depth estimation methods on the NAVERLABS dataset, which includes complex and crowded scenes.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Jung_2021_ICCV, author = {Jung, Dongki and Choi, Jaehoon and Lee, Yonghan and Kim, Deokhwa and Kim, Changick and Manocha, Dinesh and Lee, Donghwan}, title = {DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {12797-12807} }