-
[pdf]
[supp]
[bibtex]@InProceedings{Jo_2025_ICCV, author = {Jo, Hyunwook and Maeng, Jiseung and Park, Jun Hyung and Ahn, Namhyuk and Park, In Kyu}, title = {A Plug-and-Play Approach for Robust Image Editing in Text-to-Image Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {4380-4389} }
A Plug-and-Play Approach for Robust Image Editing in Text-to-Image Diffusion Models
Abstract
With the advancement of diffusion models, a wide range of image editing techniques have also been developed. To support these, various inversion methods have been introduced to preserve the original content. However, these inversion methods often exhibit instability, often failing to reconstruct certain images, particularly when applied to high-resolution diffusion models equipped with deep U-Nets. To address this issue, we propose a novel plug-and-play RLI (Residual Linear Interpolation) method. During the forward process, the method operates within the self-attention mechanism and performs an interpolation between the attention values before and after the computation. This interpolation mitigates abrupt changes in the attention map, thereby enabling smoother transitions in spatial representations and reducing unintended distortions of the original content. Our method is compatible with various existing diffusion model variants, inversion techniques, and image editing approaches. In particular, it provides a significant solution to the reconstruction failure observed when using Null-text Inversion with SDXL, where the null-text optimization does not converge properly. In addition, we demonstrate that, when combined with diverse inversion methods and image editing methods across multiple diffusion models, our approach achieves superior preservation of the original content, both quantitatively and qualitatively, without compromising the existing editing performance.
Related Material
