HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation

Yintong Wang, LiLi Chen, Jiamao Li, Xiaolin Zhang; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 5675-5684

Abstract


Despite the substantial progress in 3D hand pose estimation, inferring plausible and accurate poses in the presence of severe self-occlusion and high self-similarity remains an inherent challenge. To mitigate the ambiguity arising from invisible and similar joints, we propose a novel Topology-aware Transformer network named HandGCNFormer, incorporating the prior knowledge of hand kinematic topology into the network while modeling long-range context information. Specifically, we present a novel Graphformer decoder with an additional node-offset graph convolutional layer (NoffGConv) that optimizes the synergy of Transformer and GCN, capturing long-range dependencies as well as local topology connection between joints. Furthermore, we replace the standard MLP prediction head with a novel Topology-aware head to better utilize local topology constraints for more plausible and accurate poses. Our method achieves state-of-the-art performance on four challenging datasets including Hands2017, NYU, ICVL, and MSRA.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wang_2023_WACV, author = {Wang, Yintong and Chen, LiLi and Li, Jiamao and Zhang, Xiaolin}, title = {HandGCNFormer: A Novel Topology-Aware Transformer Network for 3D Hand Pose Estimation}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {5675-5684} }