DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Ling, Lu; Sheng, Yichen; Tu, Zhi; Zhao, Wentian; Xin, Cheng; Wan, Kun; Yu, Lantao; Guo, Qianyu; Yu, Zixun; Lu, Yawen; Li, Xuanmao; Sun, Xingpeng; Ashok, Rohan; Mukherjee, Aniruddha; Kang, Hao; Kong, Xiangrui; Hua, Gang; Zhang, Tianyi; Benes, Bedrich; Bera, Aniket

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 22160-22169

Abstract

We have witnessed significant progress in deep learning-based 3D vision ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However existing scene-level datasets for deep learning-based 3D vision limited to either synthetic environments or a narrow selection of real-world scenes are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap we present DL3DV-10K a large-scale scene dataset featuring 51.2 million frames from 10510 videos captured from 65 types of point-of-interest (POI) locations covering both bounded and unbounded scenes with different levels of reflection transparency and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K which revealed valuable insights for future research in NVS. In addition we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset benchmark results and models will be publicly accessible.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Ling_2024_CVPR, author = {Ling, Lu and Sheng, Yichen and Tu, Zhi and Zhao, Wentian and Xin, Cheng and Wan, Kun and Yu, Lantao and Guo, Qianyu and Yu, Zixun and Lu, Yawen and Li, Xuanmao and Sun, Xingpeng and Ashok, Rohan and Mukherjee, Aniruddha and Kang, Hao and Kong, Xiangrui and Hua, Gang and Zhang, Tianyi and Benes, Bedrich and Bera, Aniket}, title = {DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {22160-22169} }