Generating Multiple Hypotheses for 3D Human Mesh and Pose using Conditional Generative Adversarial Nets
Despite recent successes in 3D human mesh/pose recovery, the human mesh/pose reconstruction ambiguity is a challenging problem that can not be avoided as lighting, occlusion or self-occlusion in scenes happens. We argue that it could be multiple 3D human meshes corresponding a single image from a view point, because we really do not know what happens in extreme lighting or behind occlusion/self occlusion.In this paper, we address the problem using Conditional Generative Adversarial Nets (CGANs) to generate multiple hypotheses for 3D human mesh and pose from a single image under the condition of 2D joints and relative depth of joints. The initial estimation of 2D human skeletons, relative depth between adjacent joints and features is taken as input of CGANs to train the generator and discriminator. Then generator of CGANs is used to generate multiple human meshes via different conditions which are consistent with human silhouette and 2D joint points. Selecting and clustering are utilized to eliminate abnormal and redundant human meshes. The number of hypothesis is not unified for each single image, and it is dependent on 2D pose ambiguity. Unlike the existing end-to-end 3D human mesh recovery methods, our approach consists of three task-specific deep networks trained separately to mitigate the training burden in terms of time and datasets. Our approach has evaluated not only on the datasets of laboratory and real scenes but also on Internet images qualitatively and quantitatively, and experimental results demonstrate the effectiveness of our approach.