Motion Diversification Networks

Hee Jae Kim, Eshed Ohn-Bar; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 1650-1660

Abstract


We introduce Motion Diversification Networks a novel framework for learning to generate realistic and diverse 3D human motion. Despite recent advances in deep generative motion modeling existing models often fail to produce samples that capture the full range of plausible and natural 3D human motion within a given context. The lack of diversity becomes even more apparent in applications where subtle and multi-modal 3D human forecasting is crucial for safety such as robotics and autonomous driving. Towards more realistic and functional 3D motion models we highlight limitations in existing generative modeling techniques particularly in overly simplistic latent code sampling strategies. We then introduce a transformer-based diversification mechanism that learns to effectively guide sampling in the latent space. Our proposed attention-based module queries multiple stochastic samples to flexibly predict a diverse set of latent codes which can be subsequently decoded into motion samples. The proposed framework achieves state-of-the-art diversity and accuracy prediction performance across a range of benchmarks and settings particularly when used to forecast intricate in-the-wild 3D human motion within complex urban environments. Our models datasets and code are available at https://mdncvpr.github.io/.

Related Material


[pdf]
[bibtex]
@InProceedings{Kim_2024_CVPR, author = {Kim, Hee Jae and Ohn-Bar, Eshed}, title = {Motion Diversification Networks}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {1650-1660} }