StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

Zheng Zhang, Lihe Yang, Tianyu Yang, Chaohui Yu, Xiaoyang Guo, Yixing Lao, Hengshuang Zhao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 7069-7078

Abstract


Recent advances in monocular depth estimation significantly improve robustness and accuracy. However, relative depth models exhibit flickering and 3D inconsistency in video data, limiting 3D reconstruction applications. We introduce StableDepth, a scene-consistent and scale-invariant depth estimation method achieving scene-level 3D consistency. Our dual-decoder architecture learns from large-scale unlabeled video data, enhancing generalization and reducing flickering. Unlike previous methods requiring full video sequences, StableDepth enables online inference at 13x faster speed, achieving significant improvements across benchmarks with comparable temporal consistency to video diffusion-based estimators.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhang_2025_ICCV, author = {Zhang, Zheng and Yang, Lihe and Yang, Tianyu and Yu, Chaohui and Guo, Xiaoyang and Lao, Yixing and Zhao, Hengshuang}, title = {StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {7069-7078} }