Consistent and Controllable Image Animation with Motion Diffusion Models

Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Tien-Tsin Wong, Yuan-Fang Li, Cunjian Chen; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 7288-7298

Abstract


Diffusion models have achieved significant progress in the task of image animation due to their powerful generative capabilities. However, preserving appearance consistency to the static input image, and avoiding abrupt motion change in the generated animation, remains challenging. In this paper, we introduce Cinemo, a novel image animation approach that aims to achieve better appearance consistency and motion smoothness. The core of Cinemo is to focus on learning the distribution of motion residuals, rather than directly predicting frames as in existing diffusion models. During the inference, we further mitigate the sudden motion changes in the generated video by introducing a novel DCT-based noise refinement strategy. To counteract the over-smoothing of motion, we introduce a dynamics degree control design for better control of the magnitude of motion. Altogether, these strategies enable Cinemo to produce highly consistent, smooth, and motion-controllable results. Extensive experiments compared with several state-of-the-art methods demonstrate the effectiveness and superiority of our proposed approach. In the end, we also demonstrate how our model can be applied for motion transfer or video editing of any given video.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Ma_2025_CVPR, author = {Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Wong, Tien-Tsin and Li, Yuan-Fang and Chen, Cunjian}, title = {Consistent and Controllable Image Animation with Motion Diffusion Models}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {7288-7298} }