-
[pdf]
[arXiv]
[bibtex]@InProceedings{Spencer_2023_CVPR, author = {Spencer, Jaime and Qian, C. Stella and Trescakova, Michaela and Russell, Chris and Hadfield, Simon and Graf, Erich W. and Adams, Wendy J. and Schofield, Andrew J. and Elder, James and Bowden, Richard and Anwar, Ali and Chen, Hao and Chen, Xiaozhi and Cheng, Kai and Dai, Yuchao and Hoa, Huynh Thai and Hossain, Sadat and Huang, Jianmian and Jing, Mohan and Li, Bo and Li, Chao and Li, Baojun and Liu, Zhiwen and Mattoccia, Stefano and Mercelis, Siegfried and Nam, Myungwoo and Poggi, Matteo and Qi, Xiaohua and Ren, Jiahui and Tang, Yang and Tosi, Fabio and Trinh, Linh and Uddin, S. M. Nadim and Umair, Khan Muhammad and Wang, Kaixuan and Wang, Yufei and Wang, Yixing and Xiang, Mochu and Xu, Guangkai and Yin, Wei and Yu, Jun and Zhang, Qi and Zhao, Chaoqiang}, title = {The Second Monocular Depth Estimation Challenge}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {3064-3076} }
The Second Monocular Depth Estimation Challenge
Abstract
This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes complex natural environments, e.g. forests or fields, which are greatly underrepresented in current benchmarks. The challenge received eight unique submissions that outperformed the provided SotA baseline on any of the pointcloud- or image-based metrics. The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%. Supervised submissions generally leveraged large collections of datasets to improve data diversity. Self-supervised submissions instead updated the network architecture and pretrained backbones. These results represent a significant progress in the field, while highlighting avenues for future research, such as reducing interpolation artifacts at depth boundaries, improving self-supervised indoor performance and overall natural image accuracy.
Related Material