Sequential Texts Driven Cohesive Motions Synthesis with Natural Transitions

Shuai Li, Sisi Zhuang, Wenfeng Song, Xinyu Zhang, Hejia Chen, Aimin Hao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 9498-9508


The intelligent synthesis/generation of daily-life motion sequences is fundamental and urgently needed for many VR/metaverse-related applications. However, existing approaches commonly focus on monotonic motion generation (e.g., walking, jumping, etc.) based on single instruction-like text, which is still not intelligent enough and can't meet practical demands. To this end, we propose a cohesive human motion sequence synthesis framework based on free-form sequential texts while ensuring semantic connection and natural transitions between adjacent motions. At the technical level, we explore the local-to-global semantic features of previous and current texts to extract relevant information. This information is used to guide the framework in understanding the semantics of the current moment. Moreover, we propose learnable tokens to adaptively learn the influence range of the previous motions towards natural transitions. These tokens can be trained to encode the relevant information into well-designed transition loss. To demonstrate the efficacy of our method, we conduct extensive experiments and comprehensive evaluations on the public dataset as well as a new dataset produced by us. All the experiments confirm that our method outperforms the state-of-the-art methods in terms of semantic matching, realism, and transition fluency. Our project is public available.

Related Material

[pdf] [supp]
@InProceedings{Li_2023_ICCV, author = {Li, Shuai and Zhuang, Sisi and Song, Wenfeng and Zhang, Xinyu and Chen, Hejia and Hao, Aimin}, title = {Sequential Texts Driven Cohesive Motions Synthesis with Natural Transitions}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {9498-9508} }