-
[pdf]
[arXiv]
[bibtex]@InProceedings{Zhou_2024_CVPR, author = {Zhou, Keyang and Bhatnagar, Bharat Lal and Lenssen, Jan Eric and Pons-Moll, Gerard}, title = {GEARS: Local Geometry-aware Hand-object Interaction Synthesis}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {20634-20643} }
GEARS: Local Geometry-aware Hand-object Interaction Synthesis
Abstract
Generating realistic hand motion sequences in interaction with objects has gained increasing attention with the growing interest in digital humans. Prior work has illustrated the effectiveness of employing occupancy-based or distance-based virtual sensors to extract hand-object interaction features. Nonetheless these methods show limited generalizability across object categories shapes and sizes. We hypothesize that this is due to two reasons: 1) the limited expressiveness of employed virtual sensors and 2) scarcity of available training data. To tackle this challenge we introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions. The sensor queries for object surface points in the neighbourhood of each hand joint. As an important step towards mitigating the learning complexity we transform the points from global frame to hand template frame and use a shared module to process sensor features of each individual joint. This is followed by a spatio-temporal transformer network aimed at capturing correlation among the joints in different dimensions. Moreover we devise simple heuristic rules to augment the limited training sequences with vast static hand grasping samples. This leads to a broader spectrum of grasping types observed during training in turn enhancing our model's generalization capability. We evaluate on two public datasets GRAB and InterCap where our method shows superiority over baselines both quantitatively and perceptually.
Related Material