A Sequential Learning-Based Approach for Monocular Human Performance Capture

Jianchun Chen, Jayakorn Vongkulbhisal, Fernando De la Torre Frade; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 3514-3523

Abstract


Human performance capture from RGB videos in unconstrained environments has become very popular for applications that require generating virtual avatars or digital actors. SOTA methods use neural network (NN) techniques to estimate the shape directly from photos, yielding a simplified model of the human body. While effective, NN techniques frequently fail under challenging poses and do not preserve temporal consistency. On the other hand, optimization-based methods like shape-from-silhouette can produce more precise reconstruction; however, they typically require a good initialization and are computationally more intensive than NN. To address issues of previous methods, this work proposes a learning-based approach for optimizing fine-grained shape representation (e.g., clothes, wrinkles) from a monocular RGB video. Our main idea is to sequentially recover different shape details (e.g., average shape, clothing, wrinkles) using separate neural networks. At each level, our network takes the sparse/noisy gradients of body mesh vertices w.r.t the shape, and predicts dense gradients to update the body shape. Despite being trained on synthetic data, these networks have surprisingly good generalization to real images. Experimental validation shows that our approach outperforms NN approaches in recovering shape details while also being an order of magnitude faster than optimization-based methods and robust across varied poses and novel views.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Chen_2024_WACV, author = {Chen, Jianchun and Vongkulbhisal, Jayakorn and De la Torre Frade, Fernando}, title = {A Sequential Learning-Based Approach for Monocular Human Performance Capture}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {3514-3523} }