-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Alzayer_2026_CVPR, author = {Alzayer, Hadi and Zhang, Yunzhi and Geng, Chen and Huang, Jia-Bin and Wu, Jiajun}, title = {Coupled Diffusion Sampling for Training-Free Multi-View Image Editing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {43686-43696} }
Coupled Diffusion Sampling for Training-Free Multi-View Image Editing
Abstract
Given a collection of multi-view images, we perform consistent multi-view editing with a training-free framework using pre-trained 2D editing models and a generative multi-view model. While 2D editing models can independently edit each image in a set of multi-view images of a 3D scene, they do not maintain consistency across views. Existing approaches typically rely on explicit 3D representations to average out inconsistencies, but they suffer from lengthy optimization, instability under sparse-view settings, and blurry results. We address the problem from a different lens, using the 2D editing model to guide a multi-view generative model during diffusion sampling. This is achieved through our novel coupled diffusion sampling process. We concurrently sample two trajectories from both a multi-view image distribution and a 2D edited image distribution, and connect the samples with a coupling term. Effectively, the two models guide each other during sampling, and the resulting sample from the multi-view model remains consistent while satisfying the desired edit. We validate the effectiveness and generality of this framework on three distinct multi-view image editing tasks, and demonstrate its applicability across various model architectures. We further illustrate the effects of coupling on SoTA image and video generation models, highlighting the potential of our method beyond multi-view editing.
Related Material

