AnimateAnything: Consistent and Controllable Animation for Video Generation

Lei, Guojun; Wang, Chi; Zhang, Rong; Wang, Yikai; Li, Hong; Xu, Weiwei

Guojun Lei, Chi Wang, Rong Zhang, Yikai Wang, Hong Li, Weiwei Xu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 27946-27956

Abstract

We propose a unified approach for video-controlled generation, enabling text-based guidance and manual annotations to control the generation of videos, similar to camera direction guidance. Specifically, we designed a two-stage algorithm. In the first stage, we convert all control information into frame-by-frame motion flows. In the second stage, we use these motion flows as guidance to control the final video generation. Additionally, to reduce instability in the generated videos caused by large motion variations (such as those from camera movement, object motion, or manual inputs), which can result in flickering or the intermittent disappearance of objects, we transform the temporal feature computation in the video model into frequency-domain feature computation. This is because frequency-domain signals better capture the essential characteristics of an image, and by ensuring consistency in the video's frequency-domain features, we can enhance temporal coherence and reduce flickering in the final generated video.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Lei_2025_CVPR, author = {Lei, Guojun and Wang, Chi and Zhang, Rong and Wang, Yikai and Li, Hong and Xu, Weiwei}, title = {AnimateAnything: Consistent and Controllable Animation for Video Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {27946-27956} }