RFDM: Residual Flow Diffusion Models for Video Editing

Salehi, Mohammadreza; Noroozi, Mehdi; Morreale, Luca; Chavhan, Ruchika; Chadwick, Malcolm; Gil Couto Pimentel Ramos, Alberto; Mehrotra, Abhinav

Mohammadreza Salehi, Mehdi Noroozi, Luca Morreale, Ruchika Chavhan, Malcolm Chadwick, Alberto Gil Couto Pimentel Ramos, Abhinav Mehrotra; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 43514-43524

Abstract

Instructional video editing applies edits to an input video using only text prompts, enabling intuitive natural-language control. Despite the rapid progress, most methods still require fixed-length inputs and substantial compute. Meanwhile, autoregressive video generation enables efficient variable-length synthesis, yet remains under-explored for video editing. We introduce a causal, efficient video editing model that edits variable-length videos frame by frame. For efficiency, we start from a 2D image-to-image (I2I) diffusion model and adapt it to video-to-video (V2V) editing by conditioning the edit at time step t on the model's prediction at t-1. To leverage videos' temporal redundancy, we propose a new I2I diffusion forward process formulation that encourages the model to predict the residual between the target output and the previous prediction. We call this \underline R esidual \underline F low \underline D iffusion \underline M odel (\methodname), which focuses the denoising process on changes between consecutive frames. Moreover, we propose a new benchmark that better ranks state-of-the-art methods by faithfulness for video editing tasks. Trained on paired video data for global/local style transfer and object removal, \methodname surpasses I2I-based methods and competes with fully spatiotemporal (3D) V2V models, while matching the compute of image models and scaling independently of input video length. More content can be found in \href https://smsd75.github.io/RFDM_page/ RFDM page .

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Salehi_2026_CVPR, author = {Salehi, Mohammadreza and Noroozi, Mehdi and Morreale, Luca and Chavhan, Ruchika and Chadwick, Malcolm and Gil Couto Pimentel Ramos, Alberto and Mehrotra, Abhinav}, title = {RFDM: Residual Flow Diffusion Models for Video Editing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {43514-43524} }