Generative Image Dynamics

Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 24142-24153

Abstract


We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural oscillatory dynamics of objects such as treesflowers candles and clothes swaying in the wind. We model dense long-term motion in the Fourier domain as spectral volumes which we find are well-suited to prediction with diffusion models. Given a single image our trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume which can be converted into a motion texture that spans an entire video. Along with an image-based rendering module the predicted motion representation can be used for a number of downstream applications such as turning still images into seamlessly looping videos or allowing users to interact with objects in real images producing realistic simulated dynamics (by interpreting the spectral volumes as image-space modal bases). See our project page for more results: generative-dynamics.github.io

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Li_2024_CVPR, author = {Li, Zhengqi and Tucker, Richard and Snavely, Noah and Holynski, Aleksander}, title = {Generative Image Dynamics}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {24142-24153} }