VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9212-9221

Abstract


Text-to-video diffusion models have advanced video generation significantly. However customizing these models to generate videos with tailored motions presents a substantial challenge. In specific they encounter hurdles in (1) accurately reproducing motion from a target video and (2) creating diverse visual variations. For example straightforward extensions of static image customization methods to video often lead to intricate entanglements of appearance and motion data. To tackle this here we present the Video Motion Customization (VMC) framework a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive noisy latent frames as a motion reference. The diffusion process then preserve low-frequency motion trajectories while mitigating high-frequency motion-unrelated noise in image space. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts. Our codes and data can be found at: https://video-motion-customization.github.io/

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Jeong_2024_CVPR, author = {Jeong, Hyeonho and Park, Geon Yeong and Ye, Jong Chul}, title = {VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {9212-9221} }