Knowledge Distillation for Fast and Accurate Monocular Depth Estimation on Mobile Devices

Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 2457-2465

Abstract


Fast and accurate monocular depth estimation on mobile devices is a challenging task as one should always trade off the accuracy against the inference time. Most monocular depth methods adopt models with large computation overhead, which are not applicable on mobile devices. However, directly training a light-weight neural network to estimate depth can yield poor performance. To remedy this, we utilize knowledge distillation, transferring the knowledge and representation ability of a stronger teacher network to a light-weight student network. Experiments on Mobile AI 2021 (MAI2021) dataset demonstrate that our solution helps increase the fidelity of the output depth map and maintain fast inference speed. Specifically, with 94.7% less parameters than teacher network, the si-RMSE of student network only decrease by 0.04. Moreover, our method ranks second in the MAI2021 Monocular Depth Estimation Challenge, with a si-RMSE of 0.2602, a RMSE of 3.25, and the inference time is 1197 ms tested on the Raspberry Pi 4.

Related Material


[pdf]
[bibtex]
@InProceedings{Wang_2021_CVPR, author = {Wang, Yiran and Li, Xingyi and Shi, Min and Xian, Ke and Cao, Zhiguo}, title = {Knowledge Distillation for Fast and Accurate Monocular Depth Estimation on Mobile Devices}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {2457-2465} }