ModSkill: Physical Character Skill Modularization

Yiming Huang, Zhiyang Dou, Lingjie Liu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 12394-12404

Abstract


Human motion is highly diverse and dynamic, posing challenges for imitation learning algorithms that aim to generalize motor skills for controlling simulated characters. Prior methods typically rely on a universal full-body controller for tracking reference motion (tracking-based model) or a unified full-body skill embedding space (skill embedding). However, these approaches often struggle to generalize and scale to larger motion datasets. In this work, we introduce a novel skill learning framework, ModSkill, that decouples complex full-body skills into compositional, modular skills for independent body parts, leveraging body structure-inspired inductive bias to enhance skill learning performance. Our framework features a skill modularization attention mechanism that processes policy observations into modular skill embeddings that guide low-level controllers for each body part. We further propose Generative Adaptive Sampling for Active Skill Learning, using large motion generation models to adaptively enhance policy learning in challenging tracking scenarios. Results show that this modularized skill learning framework, enhanced by generative sampling, outperforms existing methods in precise full-body motion tracking and enables reusable skill embeddings for diverse goal-driven tasks.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Huang_2025_ICCV, author = {Huang, Yiming and Dou, Zhiyang and Liu, Lingjie}, title = {ModSkill: Physical Character Skill Modularization}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {12394-12404} }