PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video

Dong Wu, Zike Yan, Hongbin Zha; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 21507-21518

Abstract


We introduce the Panoptic 3D Reconstruction task a unified and holistic scene understanding task for a monocular video. And we present PanoRecon - a novel framework to address this new task which realizes an online geometry reconstruction alone with dense semantic and instance labeling. Specifically PanoRecon incrementally performs panoptic 3D reconstruction for each video fragment consisting of multiple consecutive key frames from a volumetric feature representation using feed-forward neural networks. We adopt a depth-guided back-projection strategy to sparse and purify the volumetric feature representation. We further introduce a voxel clustering module to get object instances in each local fragment and then design a tracking and fusion algorithm for the integration of instances from different fragments to ensure temporal coherence. Such design enables our PanoRecon to yield a coherent and accurate panoptic 3D reconstruction. Experiments on ScanNetV2 demonstrate a very competitive geometry reconstruction result compared with state-of-the-art reconstruction methods as well as promising 3D panoptic segmentation result with only RGB input while being real-time. Code is available at: https://github.com/Riser6/PanoRecon.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wu_2024_CVPR, author = {Wu, Dong and Yan, Zike and Zha, Hongbin}, title = {PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {21507-21518} }