Unsupervised Learning of 3D Object Categories From Videos in the Wild

Philipp Henzler, Jeremy Reizenstein, Patrick Labatut, Roman Shapovalov, Tobias Ritschel, Andrea Vedaldi, David Novotny; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 4700-4709

Abstract


Recently, numerous works have attempted to learn 3D reconstructors of textured 3D models of visual categories given a training set of annotated static images of objects. In this paper, we seek to decrease the amount of needed supervision by leveraging a collection of object-centric videos captured in-the-wild without requiring any manual 3D annotations. Since existing category-centric datasets are insufficient for this problem, we contribute with a large-scale crowd-sourced dataset of object-centric videos suitable for this task. We further propose a novel method that learns via differentiable rendering of a predicted implicit surface of the scene. Here, inspired by classic multi-view stereo methods, our key technical contribution is a novel warp-conditioned implicit shape function, which is robust to the noise in the SfM video reconstructions that supervise our learning. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on 2 existing benchmarks and on our novel dataset.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Henzler_2021_CVPR, author = {Henzler, Philipp and Reizenstein, Jeremy and Labatut, Patrick and Shapovalov, Roman and Ritschel, Tobias and Vedaldi, Andrea and Novotny, David}, title = {Unsupervised Learning of 3D Object Categories From Videos in the Wild}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {4700-4709} }