T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

Taeryung Lee, Fabien Baradel, Thomas Lucas, Kyoung Mu Lee, Grègory Rogez; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1867-1876

Abstract


In this paper we address the challenging problem of long-term 3D human motion generation. Specifically we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e. paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods using previously generated motion chunks as input for the next step. However this approach has two drawbacks: 1) it relies on sequential datasets which are expensive; 2) these methods yield unrealistic gaps between motions generated at each step. To address these issues we introduce simple yet effective T2LM a continuous long-term generation framework that can be trained without sequential data. T2LM comprises two components: a 1D-convolutional VQVAE trained to compress motion to sequences of latent vectors and a Transformer-based Text Encoder that predicts a latent sequence given an input text. At inference a sequence of sentences is translated into a continuous stream of latent vectors. This is then decoded into a motion by the VQVAE decoder; the use of 1D convolutions with a local temporal receptive field avoids temporal inconsistencies between training and generated sequences. This simple constraint on the VQ-VAE allows it to be trained with short sequences only and produces smoother transitions. T2LM outperforms prior long-term generation models while overcoming the constraint of requiring sequential data; it is also competitive with SOTA single-action generation models.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Lee_2024_CVPR, author = {Lee, Taeryung and Baradel, Fabien and Lucas, Thomas and Lee, Kyoung Mu and Rogez, Gr\`egory}, title = {T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1867-1876} }