Causal Motion Diffusion Models for Autoregressive Motion Generation

Qing Yu, Akihisa Watanabe, Kent Fujiwara; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 38366-38375

Abstract


Recent advances in motion diffusion models have substantially improved the realism of human motion synthesis. However, existing approaches fall into two categories: full-sequence diffusion models with bidirectional generation, which limit temporal causality and real-time applicability, and autoregressive models, which are vulnerable to instability and error accumulation. In this work, we present Causal Motion Diffusion Models (CMDM), a unified framework for autoregressive motion generation based on a causal diffusion transformer that operates in a semantically aligned latent space. CMDM builds upon a Motion-Language-Aligned Causal VAE (MAC-VAE), which encodes motion sequences into temporally causal latent representations. On top of this latent representation, an autoregressive diffusion transformer is trained using causal diffusion forcing to perform temporally ordered denoising across motion frames. To achieve fast inference, we introduce a frame-wise sampling schedule with causal uncertainty, where each subsequent frame is predicted from partially denoised previous frames. The resulting framework supports high-quality text-to-motion generation, streaming synthesis, and long-horizon motion generation at interactive rates. Experiments on HumanML3D and SnapMoGen demonstrate that CMDM outperforms existing diffusion and autoregressive models in both semantic fidelity and temporal smoothness, while substantially reducing inference latency.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yu_2026_CVPR, author = {Yu, Qing and Watanabe, Akihisa and Fujiwara, Kent}, title = {Causal Motion Diffusion Models for Autoregressive Motion Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {38366-38375} }