Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models

Lorenzo Mandelli, Stefano Berretti; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 1279-1288

Abstract


In this paper we address the challenge of generating realistic 3D human motions for action classes that were never seen during the training phase. Our approach involves decomposing complex actions into simpler movements specifically those observed during training by leveraging the knowledge of human motion contained in GPTs models. These simpler movements are then combined into a single realistic animation using the properties of diffusion models. Our claim is that this decomposition and subsequent recombination of simple movements can synthesize an animation that accurately represents the complex input action. This method operates during the inference phase and can be integrated with any pre-trained diffusion model enabling the synthesis of motion classes not present in the training data. We evaluate our method by dividing two benchmark human motion datasets into basic and complex actions and then compare its performance against the state-of-the-art. Our code and models are publicly available at our https://github.com/divanoLetto/MotionCompositionDiffusion.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Mandelli_2025_WACV, author = {Mandelli, Lorenzo and Berretti, Stefano}, title = {Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {1279-1288} }