TravelNet: Self-Supervised Physically Plausible Hand Motion Learning From Monocular Color Images

Zimeng Zhao, Xi Zhao, Yangang Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11666-11676

Abstract


This paper aims to reconstruct physically plausible hand motion from monocular color images. Existing frame-by-frame estimating approaches can not guarantee the physical plausibility (e.g. penetration, jittering) directly. In this paper, we embed physical constraints on the per-frame estimated motions in both spatial and temporal space. Our key idea is to adopt a self-supervised learning strategy to train a novel encoder-decoder, named TravelNet, whose training motion data is prepared by the physics engine using discrete pose states. TravelNet captures key pose states from hand motion sequences as compact motion descriptors, inspired by the concept of keyframes in animation. Finally, it manages to extract those key states out of perturbations without manual annotations, and reconstruct the motions preserving details and physical plausibility. In the experiments, we show that the outputs of the TravelNet contain both finger synergism and time consistency. Through the proposed framework, hand motions can be accurately reconstructed and flexibly re-edited, which is superior to the state-of-the-art methods.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhao_2021_ICCV, author = {Zhao, Zimeng and Zhao, Xi and Wang, Yangang}, title = {TravelNet: Self-Supervised Physically Plausible Hand Motion Learning From Monocular Color Images}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {11666-11676} }