-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Zhang_2026_CVPR, author = {Zhang, Xinyu and Kou, Ziyi and Qin, Chuan and Huang, Mia and Ristani, Ergys and Kumar, Ankit and Chen, Lele and He, Kun and Boularias, Abdeslam and Guan, Li}, title = {Glove2Hand: Synthesizing Natural Hand-Object Interaction from Multi-Modal Sensing Gloves}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {1829-1840} }
Glove2Hand: Synthesizing Natural Hand-Object Interaction from Multi-Modal Sensing Gloves
Abstract
Understanding hand-object interaction (HOI) is fundamental to computer vision, robotics, and AR/VR. However, conventional hand videos often lack essential physical information, such as contact forces and motion dynamics, and are prone to frequent occlusions. To address these challenges, we present Glove2Hand, a framework that translates multi-modal sensing glove data in HOI videos into photorealistic bare-hand representations, while faithfully preserving the underlying physical interaction dynamics. We introduce a novel 3D Gaussian hand model that ensures both temporal and multi-view rendering consistency. The rendered hand is seamlessly integrated into the scene using a diffusion-based hand restorer, which effectively handles complex hand-object interactions and non-rigid deformations. Leveraging Glove2Hand, we introduce HandSense, the first multi-modal HOI dataset featuring multi-view bare-hand videos with synchronized tactile and IMU signals. We demonstrate that HandSense significantly enhances downstream bare-hand applications, including video-based contact estimation and hand tracking under severe occlusion.
Related Material

