-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Zhang_2025_ICCV, author = {Zhang, Pengfei and Liu, Pinxin and Garrido, Pablo and Kim, Hyeongwoo and Chaudhuri, Bindita}, title = {KinMo: Kinematic-aware Human Motion Understanding and Generation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11187-11197} }
KinMo: Kinematic-aware Human Motion Understanding and Generation
Abstract
Current human motion synthesis frameworks rely on global action descriptions, creating a modality gap that limits both motion understanding and generation capabilities. A single coarse description, such as "run", fails to capture essential details like variations in speed, limb positioning, and kinematic dynamics, leading to significant ambiguities between text and motion modalities. To address this challenge, we introduce KinMo, a unified framework built on a hierarchical describable motion representation that extends beyond global action by incorporating kinematic group movements and their interactions.We design an automated annotation pipeline to generate high-quality, fine-grained descriptions for this decomposition, resulting in the KinMo dataset. To leverage these structured descriptions, we propose Hierarchical Text-Motion Alignment, improving spatial understanding by integrating additional motion details. Furthermore, we introduce a coarse-to-fine generation procedure to demonstrate how enhanced spatial understanding benefits motion synthesis. Experimental results show that KinMo significantly improves motion understanding, demonstrated by enhanced text-motion retrieval performance and enabling more fine-grained motion generation and editing capabilities. Project Page: https://andypinxinliu.github.io/KinMo
Related Material