- [pdf] [supp] [arXiv]
Unsupervised Learning of 3D Object Categories From Videos in the Wild
Recently, numerous works have attempted to learn 3D reconstructors of textured 3D models of visual categories given a training set of annotated static images of objects. In this paper, we seek to decrease the amount of needed supervision by leveraging a collection of object-centric videos captured in-the-wild without requiring any manual 3D annotations. Since existing category-centric datasets are insufficient for this problem, we contribute with a large-scale crowd-sourced dataset of object-centric videos suitable for this task. We further propose a novel method that learns via differentiable rendering of a predicted implicit surface of the scene. Here, inspired by classic multi-view stereo methods, our key technical contribution is a novel warp-conditioned implicit shape function, which is robust to the noise in the SfM video reconstructions that supervise our learning. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on 2 existing benchmarks and on our novel dataset.