Deep Volumetric Video From Very Sparse Multi-View Performance Capture

Zeng Huang, Tianye Li, Weikai Chen, Yajie Zhao, Jun Xing, Chloe LeGendre, Linjie Luo, Chongyang Ma, Hao Li; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 336-354


We present a deep learning-based volumetric capture approach for performance capture using a passive and highly sparse multi-view capture system. We focus on a template-free, per-frame 3D surface reconstruction from as few as three RGB sensors, where conventional visual hull or multi-view stereo methods would fail. State-of-the-art performance capture systems require either pre-scanned actors, large number of cameras or active sensors. We introduce a novel multi-view Convolutional Neural Network (CNN) that maps 2D images to a 3D volumetric field that encodes the probabilistic distribution of surface points of the captured subject. By querying the resulting field, we can instantiate the clothed human body at arbitrary resolutions. Our approach also scales to different numbers of input images, which yield increased reconstruction quality when more views are used. Though only trained on synthetic data, our network can generalize to real captured performances. Since high-quality temporal surface reconstructions are possible, our method is suitable for low-cost full body volumetric capture solutions for consumers, which are gaining popularity for VR and AR content creation. Experimental results demonstrate that our method is significantly more robust and accurate than existing techniques where only very sparse views are available.

Related Material

author = {Huang, Zeng and Li, Tianye and Chen, Weikai and Zhao, Yajie and Xing, Jun and LeGendre, Chloe and Luo, Linjie and Ma, Chongyang and Li, Hao},
title = {Deep Volumetric Video From Very Sparse Multi-View Performance Capture},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}