SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

Wenkun He, Yun Liu, Ruitao Liu, Li Yi; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 11731-11743

Abstract


Synthesizing realistic human-object interaction motions is a critical problem in VR/AR and human animation. Unlike the commonly studied scenarios involving a single human or hand interacting with one object, we address a more generic multi-body setting with arbitrary numbers of humans, hands, and objects. The high correlations and mutual influences among bodies leads to two major challenges, for which we propose solutions. First, to satisfy the high demands for synchronization of different body motions, we mathematically derive a new set of alignment scores during the training process, and use maximum likelihood sampling on a dynamic graphical model for explicit synchronization during inference. Second, the high-frequency interactions between objects are often overshadowed by the large-scale low-frequency movements. To address this, we introduce frequency decomposition and explicitly represent high-frequency components in the frequency domain. Extensive experiments across five datasets with various multi-body configurations demonstrate the superiority of SyncDiff over existing state-of-the-art motion synthesis methods.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{He_2025_ICCV, author = {He, Wenkun and Liu, Yun and Liu, Ruitao and Yi, Li}, title = {SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11731-11743} }