R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating

Zhongkai Zhou, Xinnan Fan, Pengfei Shi, Yuanxue Xin; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 12777-12786

Abstract


In this paper, we propose Recurrent Multi-Scale Feature Modulation (R-MSFM), a new deep network architecture for self-supervised monocular depth estimation. R-MSFM extracts per-pixel features, builds a multi-scale feature modulation module, and iteratively updates an inverse depth through a parameter-shared decoder at the fixed resolution. This architecture enables our R-MSFM to maintain semantically richer while spatially more precise representations and avoid the error propagation caused by the traditional U-Net-like coarse-to-fine architecture widely used in this domain, resulting in strong generalization and efficient parameter count. Experimental results demonstrate the superiority of our proposed R-MSFM both at model size and inference speed, and show the state-of-the-art results on the KITTI benchmark. Code is available at https://github.com/jsczzzk/R-MSFM

Related Material


[pdf]
[bibtex]
@InProceedings{Zhou_2021_ICCV, author = {Zhou, Zhongkai and Fan, Xinnan and Shi, Pengfei and Xin, Yuanxue}, title = {R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {12777-12786} }