Deep Modular Network Architecture for Depth Estimation from Single Indoor Images

Seiya Ito, Naoshi Kaneko, Yuma Shinohara, Kazuhiko Sumi; Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0-0

Abstract


We propose a novel deep modular network architecture for indoor scene depth estimation from single RGB images. The proposed architecture consists of a main depth estimation network and two auxiliary semantic segmentation networks. Our insight is that semantic and geometrical structures in a scene are strongly correlated, thus we utilize global (i.e. room layout) and mid-level (i.e. objects in a room) semantic structures to enhance depth estimation. The first auxiliary network, or layout network, is responsible for room layout estimation to infer the positions of walls, floor, and ceiling of a room. The second auxiliary network, or object network, estimates per-pixel class labels of the objects in a scene, such as furniture, to give mid-level semantic cues. Estimated semantic structures are effectively fed into the depth estimation network using newly proposed discriminator networks, which discern the reliability of the estimated structures. The evaluation result shows that our architecture achieves significant performance improvements over previous approaches on the standard NYU Depth v2 indoor scene dataset.

Related Material


[pdf]
[bibtex]
@InProceedings{Ito_2018_ECCV_Workshops,
author = {Ito, Seiya and Kaneko, Naoshi and Shinohara, Yuma and Sumi, Kazuhiko},
title = {Deep Modular Network Architecture for Depth Estimation from Single Indoor Images},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
month = {September},
year = {2018}
}