PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery

Yang, Wendi; Jiang, Zi-Hang; Zhao, Shang; Zhou, S. Kevin

Wendi Yang, Zi-Hang Jiang, Shang Zhao, S. Kevin Zhou; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 4746-4756

Abstract

With the recent advancements in single-image-based 3D human pose and shape estimation (3DHPSE) there is a growing amount of works that can achieve good results on standard benchmarks but struggle to yield accurate human mesh in extreme scenarios like occlusion. Previous works propose to leverage 2D poses to help 3D HPSE model improve performance under occlusion but usually rely on manual design to integrate 2D poses and only aim for specific kinds of occlusion. In this paper we present PostoMETRO (POSe TOken enhanced MEsh TRansfOrmer) which integrates 2D pose prior knowledge as tokens into transformers to improve model's performance under occlusion. Using a VQ-VAE-based pose tokenizer we efficiently represent 2D poses as tokens and feed them to transformers together with image tokens. Subsequently these tokens are queried by vertex and joint tokens to decode 3D coordinates of mesh vertices and human joints. Our proposed 2D poses integration strategy is manual-design-free and suitable for various kinds of occlusion. Experiments on both standard and occlusion-specific benchmarks demonstrate the effectiveness of PostoMETRO.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Yang_2025_WACV, author = {Yang, Wendi and Jiang, Zi-Hang and Zhao, Shang and Zhou, S. Kevin}, title = {PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {4746-4756} }