Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation

Keonhee Han, Dominik Muhle, Felix Wimbauer, Daniel Cremers; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9837-9847

Abstract


Inferring scene geometry from images via Structure from Motion is a long-standing and fundamental problem in computer vision. While classical approaches and more recently depth map predictions only focus on the visible parts of a scene the task of scene completion aims to reason about geometry even in occluded regions. With the popularity of NeRF implicit representations also became popular for scene completion by predicting so-called density fields. Unlike explicit approaches e.g. voxel-based methods density fields also allow for accurate depth prediction and novel-view synthesis via image-based rendering. In this work we propose to fuse the scene reconstruction from multiple images and distill this knowledge into a more accurate single-view scene reconstruction. To this end we propose MVBTS to fuse density fields from multiple posed images trained fully self-supervised only from image data. Using knowledge distillation we use MVBTS to train a single-view scene completion network via direct supervision called KDBTS. It achieves state-of-the-art performance on occupancy prediction especially in occluded regions.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Han_2024_CVPR, author = {Han, Keonhee and Muhle, Dominik and Wimbauer, Felix and Cremers, Daniel}, title = {Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {9837-9847} }