-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Gong_2026_CVPR, author = {Gong, Kehong and Wen, Zhengyu and He, Weixia and Xu, Mingxi and Wang, Qi and Zhang, Ning and Li, Zhengyu and Lian, Dongze and Zhao, Wei and He, Xiaoyu and Zhang, Mingyuan}, title = {MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {7089-7099} }
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
Abstract
Motion capture now underpins content creation far beyond digital humans, yet most pipelines remain species- or template-specific. We formalize this gap as Category-Agnostic Motion Capture (CAMoCap): given a monocular video and an arbitrary rigged 3D asset as a prompt, the goal is to reconstruct a rotation-based animation (e.g., BVH) that directly drives the specific asset. We present MoCapAnything, a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware Inverse Kinematics (IK) Fitting. MoCapAnything comprises three learnable modules and a lightweight IK stage: a Reference Prompt Encoder that distills per-joint queries from the asset's skeleton, mesh, and rendered image set; a Video Feature Extractor that computes dense visual descriptors and reconstructs a coarse 4D deforming mesh to bridge the modality gap between RGB tokens and the point-cloud-like joint space; and a Unified Motion Decoder that fuses these cues to produce temporally coherent trajectories. We also curate Truebones Zoo with 1,038 motion clips, each providing a standardized skeleton-mesh-rendered-video triad. Experiments on in-domain benchmarks and in-the-wild videos show that MoCapAnything delivers high-quality skeletal animations and exhibits non-trivial cross-species retargeting across heterogeneous rigs, offering a scalable path toward prompt-based 3D motion capture for arbitrary assets. The codes are available on our project page: https://animotionlab.github.io/MoCapAnything/
Related Material

