LAMP: Learn A Motion Pattern for Few-Shot Video Generation

Wu, Ruiqi; Chen, Liangyu; Yang, Tong; Guo, Chunle; Li, Chongyi; Zhang, Xiangyu

Ruiqi Wu, Liangyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 7089-7098

Abstract

In this paper we present a few-shot text-to-video framework LAMP which enables a text-to-image diffusion model to Learn A specific Motion Pattern with 8 16 videos on a single GPU. Unlike existing methods which require a large number of training resources or learn motions that are precisely aligned with template videos it achieves a trade-off between the degree of generation freedom and the resource costs for model training. Specifically we design a motion-content decoupled pipeline that uses an off-the-shelf text-to-image model for content generation so that our tuned video diffusion model mainly focuses on motion learning. The well-developed text-to-image techniques can provide visually pleasing and diverse content as generation conditions which highly improves video quality and generation freedom. To capture the features of temporal dimension we expand the pre-trained 2D convolution layers of the T2I model to our novel temporal-spatial motion learning layers and modify the attention blocks to the temporal level. Additionally we develop an effective inference trick shared-noise sampling which can improve the stability of videos without computational costs. Our method can also be flexibly applied to other tasks e.g. real-world image animation and video editing. Extensive experiments demonstrate that LAMP can effectively learn the motion pattern on limited data and generate high-quality videos. The code and models are available at https://rq-wu.github.io/projects/LAMP.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Wu_2024_CVPR, author = {Wu, Ruiqi and Chen, Liangyu and Yang, Tong and Guo, Chunle and Li, Chongyi and Zhang, Xiangyu}, title = {LAMP: Learn A Motion Pattern for Few-Shot Video Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {7089-7098} }