Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos

Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, Lili Ju; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 7494-7504

Abstract


Depth estimation from monocular videos has important applications in many areas such as autonomous driving and robot navigation. It is a very challenging problem without knowing the camera pose since errors in camera-pose estimation can significantly affect the video-based depth estimation accuracy. In this paper, we present a novel SC-GAN network with end-to-end adversarial training for depth estimation from monocular videos without estimating the camera pose and pose change over time. To exploit cross-frame relations, SC-GAN includes a spatial correspondence module which uses Smolyak sparse grids to efficiently match the features across adjacent frames, and an attention mechanism to learn the importance of features in different directions. Furthermore, the generator in SC-GAN learns to estimate depth from the input frames, while the discriminator learns to distinguish between the ground-truth and estimated depth map for the reference frame. Experiments on the KITTI and Cityscapes datasets show that the proposed SC-GAN can achieve much more accurate depth maps than many existing state-of-the-art methods on monocular videos.

Related Material


[pdf]
[bibtex]
@InProceedings{Wu_2019_ICCV,
author = {Wu, Zhenyao and Wu, Xinyi and Zhang, Xiaoping and Wang, Song and Ju, Lili},
title = {Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}