A Cross-Dataset Study for Text-based 3D Human Motion Retrieval

Léore Bensabath, Mathis Petrovich, Gul Varol; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1932-1940

Abstract


We provide results of our study on text-based 3D human motion retrieval and particularly focus on cross-dataset generalization. Due to practical reasons such as dataset-specific human body representations existing works typically benchmark by training and testing on partitions from the same dataset. Here we employ a unified SMPL body format for all datasets which allows us to perform training on one dataset testing on the other as well as training on a combination of datasets. Our results suggest that there exist dataset biases in standard text-motion benchmarks such as HumanML3D KIT Motion-Language and BABEL. We show that text augmentations help close the domain gap to some extent but the gap remains. We further provide the first zero-shot action recognition results on BABEL without using categorical action labels during training opening up a new avenue for future research.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Bensabath_2024_CVPR, author = {Bensabath, L\'eore and Petrovich, Mathis and Varol, Gul}, title = {A Cross-Dataset Study for Text-based 3D Human Motion Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1932-1940} }