A Simple Baseline for Fast and Accurate Depth Estimation on Mobile Devices
In this paper, we propose a simple but effective encoder-decoder based network for fast and accurate depth estimation on mobile devices. Unlike other depth estimation methods using heavy context modeling modules, the encoder with a fast downsampling strategy is employed to obtain sufficient receptive field and contexts at a faster rate. To obtain dense prediction, a light decoder is adopted to recover back to the original resolution. Additionally, to improve the representative ability of the light network, we introduce a teacher-student strategy. It relies on a distillation process ensuring that the student (the proposed light network) learns from the teacher. The proposed method achieves a good trade-off between latency and accuracy. We evaluated the proposed algorithm on the MAI 2021 Monocular Depth Estimation Challenge and achieved a score of 129.41, ranked the first place, which wins the second by a large margin (129.41 v.s. 14.51). More specifically, the proposed method achieves a si-RMSE score of 0.28 with 97 ms on the Raspberry Pi 4.