A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance

Chen Henry Wu, Fernando De la Torre; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 7378-7387

Abstract


Diffusion models generate images by iterative denoising. Recent work has shown that by making the denoising process deterministic, one can encode real images into latent codes of the same size, which can be used for image editing. This paper explores the possibility of defining a latent space even when the denoising process remains stochastic. Recall that, in stochastic diffusion models, Gaussian noises are added in each denoising step, and we can concatenate all the noises to form a latent code. This results in a latent space of much higher dimensionality than the original image. We demonstrate that this latent space of stochastic diffusion models can be used in the same way as that of deterministic diffusion models in two applications. First, we propose CycleDiffusion, a method for zero-shot and unpaired image editing using stochastic diffusion models, which improves the performance over its deterministic counterpart. Second, we demonstrate unified, plug-and-play guidance in the latent spaces of deterministic and stochastic diffusion models.

Related Material


[pdf]
[bibtex]
@InProceedings{Wu_2023_ICCV, author = {Wu, Chen Henry and De la Torre, Fernando}, title = {A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {7378-7387} }