Reconstructing Animatable Categories From Videos

Gengshan Yang, Chaoyang Wang, N. Dinesh Reddy, Deva Ramanan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 16995-17005

Abstract


Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC, a method to build category-level 3D models from monocular videos, disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem: (1) specializing a category-level skeleton to instances, (2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We build 3D models for humans, cats, and dogs given monocular videos. Project page: gengshan-y.github.io/rac-www/

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yang_2023_CVPR, author = {Yang, Gengshan and Wang, Chaoyang and Reddy, N. Dinesh and Ramanan, Deva}, title = {Reconstructing Animatable Categories From Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {16995-17005} }