MotionGPT: Human Motion Synthesis With Improved Diversity and Realism via GPT-3 Prompting

Jose Ribeiro-Gomes, Tianhui Cai, Zoltán Á. Milacski, Chen Wu, Aayush Prakash, Shingo Takagi, Amaury Aubel, Daeil Kim, Alexandre Bernardino, Fernando De la Torre; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 5070-5080

Abstract


There are numerous applications for human motion synthesis, including animation, gaming, robotics, or sports science. In recent years, human motion generation from natural language has emerged as a promising alternative to costly and labor-intensive data collection methods relying on motion capture or wearable sensors (e.g., suits). Despite this, generating human motion from textual descriptions remains a challenging and intricate task, primarily due to the scarcity of large-scale supervised datasets capable of capturing the full diversity of human activity. This study proposes a new approach, called MotionGPT, to address the limitations of previous text-based human motion generation methods by utilizing the extensive semantic information available in large language models (LLMs). We first pretrain a doubly text-conditional motion diffusion model on both coarse ("high-level") and detailed ("low-level") ground truth text data. Then during inference, we improve motion diversity and alignment with the training set, by zero-shot prompting GPT-3 for additional "low-level" details. Our method achieves new state-of-the-art quantitative results in terms of Frechet Inception Distance (FID) and motion diversity metrics, and improves all considered metrics. Furthermore, it has strong qualitative performance, producing natural results.

Related Material


[pdf]
[bibtex]
@InProceedings{Ribeiro-Gomes_2024_WACV, author = {Ribeiro-Gomes, Jose and Cai, Tianhui and Milacski, Zolt\'an \'A. and Wu, Chen and Prakash, Aayush and Takagi, Shingo and Aubel, Amaury and Kim, Daeil and Bernardino, Alexandre and De la Torre, Fernando}, title = {MotionGPT: Human Motion Synthesis With Improved Diversity and Realism via GPT-3 Prompting}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {5070-5080} }