The Second Monocular Depth Estimation Challenge

Spencer, Jaime; Qian, C. Stella; Trescakova, Michaela; Russell, Chris; Hadfield, Simon; Graf, Erich W.; Adams, Wendy J.; Schofield, Andrew J.; Elder, James; Bowden, Richard; Anwar, Ali; Chen, Hao; Chen, Xiaozhi; Cheng, Kai; Dai, Yuchao; Hoa, Huynh Thai; Hossain, Sadat; Huang, Jianmian; Jing, Mohan; Li, Bo; Li, Chao; Li, Baojun; Liu, Zhiwen; Mattoccia, Stefano; Mercelis, Siegfried; Nam, Myungwoo; Poggi, Matteo; Qi, Xiaohua; Ren, Jiahui; Tang, Yang; Tosi, Fabio; Trinh, Linh; Uddin, S. M. Nadim; Umair, Khan Muhammad; Wang, Kaixuan; Wang, Yufei; Wang, Yixing; Xiang, Mochu; Xu, Guangkai; Yin, Wei; Yu, Jun; Zhang, Qi; Zhao, Chaoqiang

Jaime Spencer, C. Stella Qian, Michaela Trescakova, Chris Russell, Simon Hadfield, Erich W. Graf, Wendy J. Adams, Andrew J. Schofield, James Elder, Richard Bowden, Ali Anwar, Hao Chen, Xiaozhi Chen, Kai Cheng, Yuchao Dai, Huynh Thai Hoa, Sadat Hossain, Jianmian Huang, Mohan Jing, Bo Li, Chao Li, Baojun Li, Zhiwen Liu, Stefano Mattoccia, Siegfried Mercelis, Myungwoo Nam, Matteo Poggi, Xiaohua Qi, Jiahui Ren, Yang Tang, Fabio Tosi, Linh Trinh, S. M. Nadim Uddin, Khan Muhammad Umair, Kaixuan Wang, Yufei Wang, Yixing Wang, Mochu Xiang, Guangkai Xu, Wei Yin, Jun Yu, Qi Zhang, Chaoqiang Zhao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 3064-3076

Abstract

This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes complex natural environments, e.g. forests or fields, which are greatly underrepresented in current benchmarks. The challenge received eight unique submissions that outperformed the provided SotA baseline on any of the pointcloud- or image-based metrics. The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%. Supervised submissions generally leveraged large collections of datasets to improve data diversity. Self-supervised submissions instead updated the network architecture and pretrained backbones. These results represent a significant progress in the field, while highlighting avenues for future research, such as reducing interpolation artifacts at depth boundaries, improving self-supervised indoor performance and overall natural image accuracy.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Spencer_2023_CVPR, author = {Spencer, Jaime and Qian, C. Stella and Trescakova, Michaela and Russell, Chris and Hadfield, Simon and Graf, Erich W. and Adams, Wendy J. and Schofield, Andrew J. and Elder, James and Bowden, Richard and Anwar, Ali and Chen, Hao and Chen, Xiaozhi and Cheng, Kai and Dai, Yuchao and Hoa, Huynh Thai and Hossain, Sadat and Huang, Jianmian and Jing, Mohan and Li, Bo and Li, Chao and Li, Baojun and Liu, Zhiwen and Mattoccia, Stefano and Mercelis, Siegfried and Nam, Myungwoo and Poggi, Matteo and Qi, Xiaohua and Ren, Jiahui and Tang, Yang and Tosi, Fabio and Trinh, Linh and Uddin, S. M. Nadim and Umair, Khan Muhammad and Wang, Kaixuan and Wang, Yufei and Wang, Yixing and Xiang, Mochu and Xu, Guangkai and Yin, Wei and Yu, Jun and Zhang, Qi and Zhao, Chaoqiang}, title = {The Second Monocular Depth Estimation Challenge}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {3064-3076} }