-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Yang_2024_CVPR, author = {Yang, Zhangsihao and Zhou, Mingyuan and Shan, Mengyi and Wen, Bingbing and Xuan, Ziwei and Hill, Mitch and Bai, Junjie and Qi, Guo-Jun and Wang, Yalin}, title = {OmniMotionGPT: Animal Motion Generation with Limited Data}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {1249-1259} }
OmniMotionGPT: Animal Motion Generation with Limited Data
Abstract
Our paper aims to generate diverse and realistic animal motion sequences from textual descriptions without a large-scale animal text-motion dataset. While the task of text-driven human motion synthesis is already extensively studied and benchmarked it remains challenging to transfer this success to other skeleton structures with limited data. In this work we design a model architecture that imitates Generative Pretraining Transformer (GPT) utilizing prior knowledge learned from human data to the animal domain. We jointly train motion autoencoders for both animal and human motions and at the same time optimize through the similarity scores among human motion encoding animal motion encoding and text CLIP embedding. Presenting the first solution to this problem we are able to generate animal motions with high diversity and fidelity quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data. Additionally we introduce AnimalML3D the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities. We hope this dataset would mediate the data scarcity problem in text-driven animal motion generation providing a new playground for the research community.
Related Material