Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation

Lin Bie, Siqi Li, Yifan Feng, Yue Gao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 5081-5090

Abstract


Monocular depth estimation (MDE) is a fundamental problem in computer vision with wide-ranging applications in various downstream tasks. While multi-scale features are perceptually critical for MDE, existing transformer-based methods have yet to leverage them explicitly. To address this limitation, we propose a hypergraph-based multi-scale representation fusion framework, Hyper-Depth.The proposed Hyper-Depth incorporates two key components: a Semantic Consistency Enhancement (SCE) module and a Geometric Consistency Constraint (GCC) module. The SCE module, designed based on hypergraph convolution, aggregates global information and enhances the representation of multi-scale patch features. Meanwhile, the GCC module provides geometric guidance to reduce over-fitting errors caused by excessive reliance on local features. In addition, we introduce a Correlation-based Conditional Random Fields (C-CRFs) module as the decoder to filter correlated patches and compute attention weights more effectively.Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across all evaluation metrics on the KITTI and NYU-Depth-v2 datasets, achieving improvements of 6.21% and 3.32% on the main metric RMSE, respectively. Furthermore, zero-shot evaluations on the nuScenes and SUN-RGBD datasets validate the generalizability of our approach.

Related Material


[pdf]
[bibtex]
@InProceedings{Bie_2025_ICCV, author = {Bie, Lin and Li, Siqi and Feng, Yifan and Gao, Yue}, title = {Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {5081-5090} }