TokenHand: Discrete Token Representation for Efficient Hand Mesh Reconstruction

Xinguo He, Yixin Shen, Rahul Chaudhari; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 8921-8931

Abstract


Hand mesh reconstruction has attracted growing attention in recent years.Despite significant progress, existing methods often struggle to balance reconstruction quality and inference efficiency.In this work, we propose TokenHand, a novel framework for single-view 3D hand mesh reconstruction that achieves both high accuracy and real-time inference.Our method represents a 3D hand model using M discrete tokens, each describing a specific sub-structure of the hand.This compositional representation enables efficient modeling with minimal reconstruction error.Furthermore, we reformulate hand mesh reconstruction as a classification problem rather than a regression task.Specifically, a classifier predicts the categories of the M tokens from an input image, and a pre-trained decoder network subsequently reconstructs the 3D hand mesh from the predicted tokens without any post-processing.Extensive experiments demonstrate that TokenHand achieves comparable or superior performance to existing methods across standard benchmarks, while maintaining high efficiency in practical scenarios.

Related Material


[pdf]
[bibtex]
@InProceedings{He_2026_CVPR, author = {He, Xinguo and Shen, Yixin and Chaudhari, Rahul}, title = {TokenHand: Discrete Token Representation for Efficient Hand Mesh Reconstruction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {8921-8931} }