-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Jia_2026_CVPR, author = {Jia, Sen and Zhu, Ning and Zhong, Jinqin and Zhou, Jiale and Zhang, Huaping and Hwang, Jenq-Neng and Li, Lei}, title = {RAM: Recover Any 3D Human Motion in-the-Wild}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {42789-42799} }
RAM: Recover Any 3D Human Motion in-the-Wild
Abstract
Recovering 3D human motion from monocular videos in-the-wild remains challenging due to occlusions, rapid movements, and viewpoint variations. To address these challenges, we introduce **Recover-Anyone Module (RAM)**, a unified framework for real-time and accurate 3D human motion reconstruction. RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity association under severe occlusions and dynamic interactions. A memory-augmented Temporal HMR module further enhances human motion reconstruction by injecting spatio-temporal priors for consistent and smooth motion estimation. Moreover, a lightweight Predictor module forecasts future poses to maintain reconstruction continuity, while a gated combiner adaptively fuses reconstructed and predicted features to ensure coherence and robustness. Experiments on in-the-wild multi-person benchmarks such as PoseTrack and 3DPW, demonstrate that RAM substantially outperforms previous state-of-the-art in both Zero-shot tracking stability and 3D accuracy, offering a generalizable paradigm for markerless 3D human motion capture in-the-wild.
Related Material

