Latent Flow Diffusion for Deepfake Video Generation

Aashish Chandra K, Aashutosh A V, Srijan Das, Abhijit Das; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 3781-3790

Abstract


Image-to-video generation with conditional identity swap popularly known as deepfake aims to synthesize a new video for the target identity guided by an image of the target and a video of the source identity. The biggest challenge of these tasks lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given target image and source video. In this paper we propose a deepfake generation technique using novel latent flow diffusion (LFD) that includes an optical flow sequence in the latent space based on a given source video to warp the given target image. Compared to previous works on video diffusion our proposed LFD can swap the spatial details maintaining temporal information by utilizing the spatial content of the given target image and employing the latent flow of the source video. Our model consists of three stages: a Flow predictor model captures the optical flow of the source video two-fold Transformer encoding layers predict the driving frame and a conditioned image-to-video generator guided by the driving frame generates the final deep fake video. We conducted multiple experiments and our proposed model has consistently outperformed prior video diffusion models for deepfake generation.

Related Material


[pdf]
[bibtex]
@InProceedings{K_2024_CVPR, author = {K, Aashish Chandra and A V, Aashutosh and Das, Srijan and Das, Abhijit}, title = {Latent Flow Diffusion for Deepfake Video Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {3781-3790} }