Geometric Neural Distance Fields for Learning Human Motion Priors

Anonymous CVPR 2026 Submission

Abstract

We introduce Neural Riemannian Motion Fields (NRMF), a novel 3D generative human motion prior that enables robust, temporally consistent, and physically plausible 3D motion recovery. Unlike existing VAE or diffusion-based methods, our higher-order motion prior explicitly models the human motion in the zero level set of a collection of neural distance fields (NDFs) corresponding to pose, transition (velocity), and acceleration dynamics. Our framework is rigorous in the sense that our NDFs are constructed on the product space of joint rotations, their angular velocities, and angular accelerations, respecting the geometry of the underlying articulations. We further introduce: (i) a novel adaptive-step hybrid algorithm for projecting onto the set of plausible motions, and (ii) a novel geometric integrator to "roll out" realistic motion trajectories during test-time-optimization and generation. Our experiments show significant and consistent gains: trained on the AMASS dataset, NRMF remarkably generalizes across multiple input modalities and to diverse tasks ranging from denoising to motion in-betweening and fitting to partial 2D / 3D observations.

Method Overview

NRMF is a general-purpose, expressive and robust unconditional motion prior. It models the space of plausible poses (\(\theta\)), transitions (\(\dot{\theta}\)), and accelerations (\(\ddot{\theta}\)) on the zero-level set of a geometric neural distance field. This implicitly captures the data distribution. Poses are depicted alongside their transitions and accelerations, which are visualized as blue dots onto the per-joint distributions of learned transitions and as blue rings around the magnitude distribution of all accelerations.

Method Overview Diagram

We develop projection (\(\Pi\)) and integration algorithms to deploy NRMF into several applications as shown: (i) motion denoising from noisy observations, (ii) motion estimation on in-the-wild videos, (iii) motion in-betweening, and (iv) motion generation.

Method Overview Diagram

NRMF learns to represent the space of realistic human motion by modeling the zero-level sets of three distinct yet related neural distance fields over {\(\theta\), \(\dot{\theta}\), \(\ddot{\theta}\)}. Each component is trained to predict the distance to the manifold of plausible motion states using motion capture data. The pose field learns which joint configurations are human-like, the transition field captures temporal consistency across frames, and the acceleration field enforces second-order realism by modeling smooth and plausible dynamics. These fields enable projection-based inference and allow NRMF to robustly reconstruct temporally consistent and physically plausible motion.

Method Overview Diagram

Applications


Motion Denoising & Infilling from Noisy Observations

Our method can recover the clean and plausible motion from noisy 3D observations as input, as well as infilling the missing motion of body parts and in-betweening the motion. Gaussian noise is added to the 3D observations to simulate the noisy observations.

Noisy Input Output
Partial Input Output

Even under the challenging conditions of partial + noisy observations, NRMF can still recover clean and plausible motion.

Input Ground Truth Ours RoHM

Motion Estimation on In-the-wild Videos

Our method can recover clean and plausible motion with in-the-wild RGB-D observations as input.

Results on PROX, EgoBody, 3DPW and in-the-wild videos.

Motion In-betweening and Generation

Our method can in-between plausible motion from only partially given keyframes as input, as well as generate the natural motion from initial poses, while keeping temporally consistent and physically plausible.

In-betweening on partial keyframes.

Generation from (common standing pose)