Generative Adversarial Networks for Depth Map Estimation From RGB Video

Kin Gwn Lore, Kishore Reddy, Michael Giering, Edgar A. Bernal; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 1177-1185

Abstract


Depth cues are essential to achieving high-level scene understanding, and in particular to determining geometric relations between objects. The ability to reason about depth information in scene analysis tasks can often result in improved decision-making capabilities. Unfortunately, depth-capable sensors are not as ubiquitous as traditional RGB cameras, which limits the availability of depth-related cues. In this work, we investigate data-driven approaches for depth estimation from images or videos captured with monocular cameras. We propose three different approaches and demonstrate their efficacy through extensive experimental validation. The proposed methods rely on processing of (i) a single 3-channel RGB image frame, (ii) a sequence of RGB frames, and (iii) a single RGB frame plus the optical flow field computed between the frame and a neighboring frame in the video stream, and map the respective inputs to an estimated depth map representation. In contrast to existing literature, the input-output mapping is not directly regressed; rather, it is learned through adversarial techniques that leverage conditional generative adversarial networks (cGANs).

Related Material


[pdf]
[bibtex]
@InProceedings{Lore_2018_CVPR_Workshops,
author = {Gwn Lore, Kin and Reddy, Kishore and Giering, Michael and Bernal, Edgar A.},
title = {Generative Adversarial Networks for Depth Map Estimation From RGB Video},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2018}
}