MEGA: Masked Generative Autoencoder for Human Mesh Recovery

Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 5366-5378

Abstract


Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as an infinite set of 3D interpretations can explain the 2D observation equally well. Nevertheless, most HMR methods overlook this issue and make a single prediction without accounting for this ambiguity. A few approaches generate a distribution of human meshes, enabling the sampling of multiple predictions; however, none of them is competitive with the latest single-output model when making a single prediction. This work proposes a new approach based on masked generative modeling. By tokenizing the human pose and shape, we formulate the HMR task as generating a sequence of discrete tokens conditioned on an input image. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh token sequences. Given an image, our flexible generation scheme allows us to predict a single human mesh in deterministic mode or to generate multiple human meshes in stochastic mode. Experiments on in-the-wild benchmarks show that MEGA achieves state-of-the-art performance in deterministic and stochastic modes, outperforming single-output and multi-output approaches.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Fiche_2025_CVPR, author = {Fiche, Gu\'enol\'e and Leglaive, Simon and Alameda-Pineda, Xavier and Moreno-Noguer, Francesc}, title = {MEGA: Masked Generative Autoencoder for Human Mesh Recovery}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {5366-5378} }