Task-Oriented Human-Object Interactions Generation With Implicit Neural Representations

Quanzhou Li, Jingbo Wang, Chen Change Loy, Bo Dai; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 3035-3044

Abstract


Digital human motion synthesis is a vibrant research field with applications in movies, AR/VR, and video games. Whereas methods were proposed to generate natural and realistic human motions, most only focus on modeling humans and largely ignore object movements. Generating task-oriented human-object interaction motions in simulation is challenging. For different intents of using the objects, humans conduct various motions, which requires the human first to approach the objects and then make them move consistently with the human instead of staying still. Also, to deploy in downstream applications, the synthesized motions are desired to be flexible in length, providing options to personalize the predicted motions for various purposes. To this end, we propose TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations, which generates full human-object interaction motions to conduct specific tasks, given only the task type, the object, and a starting human status. TOHO generates human-object motions in four steps: 1) it first estimates the object's final position given the task intent; 2) it then generates keyframe poses grasping the objects; 3) after that, it infills the keyframes and generates continuous motions; 4) finally, it applies a compact closed-form object motion estimation to generate the object motion. Our method generates continuous motions that are parameterized only by the temporal coordinate, which allows for upsampling of the sequence to arbitrary frames and adjusting the motion speeds by designing the temporal coordinate vector. This work takes a step further toward general human-scene interaction simulation.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Li_2024_WACV, author = {Li, Quanzhou and Wang, Jingbo and Loy, Chen Change and Dai, Bo}, title = {Task-Oriented Human-Object Interactions Generation With Implicit Neural Representations}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {3035-3044} }