Exploring Text-to-Motion Generation with Human Preference

Jenny Sheng, Matthieu Lin, Andrew Zhao, Kevin Pruvost, Yu-Hui Wen, Yangguang Li, Gao Huang, Yong-Jin Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1888-1899

Abstract


This paper presents an exploration of preference learning in text-to-motion generation. We find that current improvements in text-to-motion generation still rely on datasets requiring expert labelers with motion capture systems. Instead learning from human preference data does not require motion capture systems; a labeler with no expertise simply compares two generated motions. This is particularly efficient because evaluating the model's output is easier than gathering the motion that performs a desired task (e.g. backflip). To pioneer the exploration of this paradigm we annotate 3528 preference pairs generated by MotionGPT marking the first effort to investigate various algorithms for learning from preference data. In particular our exploration highlights important design choices when using preference data. Additionally our experimental results show that preference learning has the potential to greatly improve current text-to-motion generative models. Our code and dataset are publicly available at https://github.com/THU-LYJ-Lab/InstructMotion to further facilitate research in this area.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Sheng_2024_CVPR, author = {Sheng, Jenny and Lin, Matthieu and Zhao, Andrew and Pruvost, Kevin and Wen, Yu-Hui and Li, Yangguang and Huang, Gao and Liu, Yong-Jin}, title = {Exploring Text-to-Motion Generation with Human Preference}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1888-1899} }