Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Li Hu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8153-8163

Abstract


Character Animation aims to generating character videos from still images through driving signals. Currently diffusion models have become the mainstream in visual generation research owing to their robust generative capabilities. However challenges persist in the realm of image-to-video especially in character animation where temporally maintaining consistency with detailed information from character remains a formidable problem. In this paper we leverage the power of diffusion models and propose a novel framework tailored for character animation. To preserve consistency of intricate appearance features from reference image we design ReferenceNet to merge detail features via spatial attention. To ensure controllability and continuity we introduce an efficient pose guider to direct character's movements and employ an effective temporal modeling approach to ensure smooth inter-frame transitions between video frames. By expanding the training data our approach can animate arbitrary characters yielding superior results in character animation compared to other image-to-video methods. Furthermore we evaluate our method on image animation benchmarks achieving state-of-the-art results.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Hu_2024_CVPR, author = {Hu, Li}, title = {Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8153-8163} }