The Midas Touch for Metric Depth

Ma, Yu; Guo, Zizhan; Xiong, Zuyi; Zhang, Haoran; Feng, Yi; Zhao, Hongbo; Wang, Hanli; Fan, Rui

Yu Ma, Zizhan Guo, Zuyi Xiong, Haoran Zhang, Yi Feng, Hongbo Zhao, Hanli Wang, Rui Fan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 5804-5813

Abstract

Recent advances have markedly improved the cross-scene generalization of relative depth estimation, yet its practical applicability remains limited by the absence of metric scale, local inconsistencies, and low computational efficiency. To address these issues, we present Midas Touch for Depth (MTD), a mathematically interpretable approach that converts relative depth into metric depth using only extremely sparse 3D data. To eliminate local scale inconsistencies, it applies a segment-wise recovery strategy via sparse graph optimization, followed by a pixel-wise refinement strategy using a discontinuity-aware geodesic cost. MTD exhibits strong generalization and achieves substantial accuracy improvements over previous depth completion and depth estimation methods. Moreover, its lightweight, plug-and-play design facilitates deployment and integration on diverse downstream 3D tasks.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Ma_2026_CVPR, author = {Ma, Yu and Guo, Zizhan and Xiong, Zuyi and Zhang, Haoran and Feng, Yi and Zhao, Hongbo and Wang, Hanli and Fan, Rui}, title = {The Midas Touch for Metric Depth}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {5804-5813} }