KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos

Novotny, David; Rocco, Ignacio; Sinha, Samarth; Carlier, Alexandre; Kerchenbaum, Gael; Shapovalov, Roman; Smetanin, Nikita; Neverova, Natalia; Graham, Benjamin; Vedaldi, Andrea

KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos

David Novotny, Ignacio Rocco, Samarth Sinha, Alexandre Carlier, Gael Kerchenbaum, Roman Shapovalov, Nikita Smetanin, Natalia Neverova, Benjamin Graham, Andrea Vedaldi; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5595-5604

Abstract

We consider the problem of reconstructing the depth of dynamic objects from videos. Recent progress in dynamic video depth prediction has focused on improving the output of monocular depth estimators by means of multi-view constraints while imposing little to no restrictions on the deformation of the dynamic parts of the scene. However, the theory of Non-Rigid Structure from Motion prescribes to constrain the deformations for 3D reconstruction. We thus propose a new model that departs significantly from this prior work. The idea is to fit a dynamic point cloud to the video data using Sinkhorn's algorithm to associate the 3D points to 2D pixels and use a differentiable point renderer to ensure the compatibility of the 3D deformations with the measured optical flow. In this manner, our algorithm, called Keypoint Transporter, models the overall deformation of the object within the entire video, so it can constrain the reconstruction correspondingly. Compared to weaker deformation models, this significantly reduces the reconstruction ambiguity and, for dynamic objects, allows Keypoint Transporter to obtain reconstructions of the quality superior or at least comparable to prior approaches while being much faster and reliant on a pre-trained monocular depth estimator network. To assess the method, we evaluate on new datasets of synthetic videos depicting dynamic humans and animals with ground-truth depth. We also show qualitative results on crowd-sourced real-world videos of pets.

Related Material

[pdf]

[bibtex]

@InProceedings{Novotny_2022_CVPR, author = {Novotny, David and Rocco, Ignacio and Sinha, Samarth and Carlier, Alexandre and Kerchenbaum, Gael and Shapovalov, Roman and Smetanin, Nikita and Neverova, Natalia and Graham, Benjamin and Vedaldi, Andrea}, title = {KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {5595-5604} }