MD2E: Modeling Depth-to-Edge Cues for Monocular Metric Depth Estimation

Ning, Chao; Shen, Minghe; Yokoya, Naoto

Chao Ning, Minghe Shen, Naoto Yokoya; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 5772-5782

Abstract

We study monocular metric depth estimation (MMDE) without camera intrinsics at training or inference. When focal length and scene depth vary together, depth changes are difficult to perceive in the image, yet the edge-frequency statistics exhibit systematic, scale-correlated shifts. Building on this observation, we introduce a spectral quantile estimator (SQE) that analyzes the Fourier spectrum of a predicted edge map and outputs a single score used as a proxy for metric scale. Consequently, we propose MD2E, a method that models depth-to-edge cues by deriving edge targets from depth annotations, calibrating metric scale using the spectral score, and using edge predictions to regularize depth boundaries while producing metric depth. Across diverse cameras and datasets, MD2E achieves state-of-the-art performance in MMDE in both zero-shot and fine-tuning settings. The project page is available at https://2j472no.github.io/MD2E/.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Ning_2026_CVPR, author = {Ning, Chao and Shen, Minghe and Yokoya, Naoto}, title = {MD2E: Modeling Depth-to-Edge Cues for Monocular Metric Depth Estimation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {5772-5782} }