Physics-Based Human Motion Estimation and Synthesis From Videos

Kevin Xie, Tingwu Wang, Umar Iqbal, Yunrong Guo, Sanja Fidler, Florian Shkurti; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11532-11541


Human motion synthesis is an important problem for applications in graphics and gaming, and even in simulation environments for robotics. Existing methods require accurate motion capture data for training, which is costly to obtain. Instead, we propose a framework for training generative models of physically plausible human motion directly from monocular RGB videos, which are much more widely available. At the core of our method is a novel optimization formulation that aims to correct imperfect image-based pose estimations by enforcing physics constraints and reasons about contacts in a differentiable way. This optimization yields corrected 3D poses and motions, as well as their corresponding contact forces. Results show that our physically-correct motions significantly outperform prior work on pose estimation. We then train a generative model to synthesize both future motion and contact forces. We demonstrate both qualitatively and quantitatively significantly improved motion synthesis quality and physical plausibility achieved by our method on the large scale Human3.6m dataset as compared to prior learning-based kinematic and physics-based methods. By learning directly from video, our method paves the way for large-scale, realistic and diverse motion synthesis not previously possible.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Xie_2021_ICCV, author = {Xie, Kevin and Wang, Tingwu and Iqbal, Umar and Guo, Yunrong and Fidler, Sanja and Shkurti, Florian}, title = {Physics-Based Human Motion Estimation and Synthesis From Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {11532-11541} }