A Simple Baseline for Efficient Hand Mesh Reconstruction

Zhishan Zhou, Shihao Zhou, Zhi Lv, Minqiang Zou, Yao Tang, Jiajun Liang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 1367-1376

Abstract


Hand mesh reconstruction has attracted considerable attention in recent years with various approaches and techniques being proposed. Some of these methods incorporate complex components and designs which while effective may complicate the model and hinder efficiency. In this paper we decompose the mesh decoder into token generator and mesh regressor. Through extensive ablation experiments we found that the token generator should select discriminating and representative points while the mesh regressor needs to upsample sparse keypoints into dense meshes in multiple stages. Given these functionalities we can achieve high performance with minimal computational resources. Based on this observation we propose a simple yet effective baseline that outperforms state-of-the-art methods by a large margin while maintaining real-time efficiency. Our method outperforms existing solutions achieving state-of-the-art (SOTA) results across multiple datasets. On the FreiHAND dataset our approach produced a PA-MPJPE of 5.8mm and a PA-MPVPE of 6.1mm. Similarly on the DexYCB dataset we observed a PA-MPJPE of 5.5mm and a PA-MPVPE of 5.5mm. As for performance speed our method reached up to 33 frames per second (fps) when using HRNet and up to 70 fps when employing FastViT-MA36. Code will be made available.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Zhou_2024_CVPR, author = {Zhou, Zhishan and Zhou, Shihao and Lv, Zhi and Zou, Minqiang and Tang, Yao and Liang, Jiajun}, title = {A Simple Baseline for Efficient Hand Mesh Reconstruction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {1367-1376} }