Unsupervised High-Resolution Depth Learning From Videos With Dual Networks

Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wenjun Zeng; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 6872-6881


Unsupervised depth learning takes the appearance difference between a target view and a view synthesized from its adjacent frame as supervisory signal. Since the supervisory signal only comes from images themselves, the resolution of training data significantly impacts the performance. High-resolution images contain more fine-grained details and provide more accurate supervisory signal. However, due to the limitation of memory and computation power, the original images are typically down-sampled during training, which suffers heavy loss of details and disparity accuracy. In order to fully explore the information contained in high-resolution data, we propose a simple yet effective dual networks architecture, which can directly take high-resolution images as input and generate high-resolution and high-accuracy depth map efficiently. We also propose a Self-assembled Attention (SA-Attention) module to handle low-texture region. The evaluation on the benchmark KITTI and Make3D datasets demonstrates that our method achieves state-of-the-art results in the monocular depth estimation task.

Related Material

[pdf] [supp]
author = {Zhou, Junsheng and Wang, Yuwang and Qin, Kaihuai and Zeng, Wenjun},
title = {Unsupervised High-Resolution Depth Learning From Videos With Dual Networks},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}