A Simple Baseline for Fast and Accurate Depth Estimation on Mobile Devices

Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 2466-2471

Abstract


In this paper, we propose a simple but effective encoder-decoder based network for fast and accurate depth estimation on mobile devices. Unlike other depth estimation methods using heavy context modeling modules, the encoder with a fast downsampling strategy is employed to obtain sufficient receptive field and contexts at a faster rate. To obtain dense prediction, a light decoder is adopted to recover back to the original resolution. Additionally, to improve the representative ability of the light network, we introduce a teacher-student strategy. It relies on a distillation process ensuring that the student (the proposed light network) learns from the teacher. The proposed method achieves a good trade-off between latency and accuracy. We evaluated the proposed algorithm on the MAI 2021 Monocular Depth Estimation Challenge and achieved a score of 129.41, ranked the first place, which wins the second by a large margin (129.41 v.s. 14.51). More specifically, the proposed method achieves a si-RMSE score of 0.28 with 97 ms on the Raspberry Pi 4.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2021_CVPR, author = {Zhang, Ziyu and Wang, Yicheng and Huang, Zilong and Luo, Guozhong and Yu, Gang and Fu, Bin}, title = {A Simple Baseline for Fast and Accurate Depth Estimation on Mobile Devices}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {2466-2471} }