Null text-guided interactive image editing for diffusion models

Wang, Jing; Luo, Hao

Jing Wang, Hao Luo; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 1906-1915

Abstract

With the remarkable success of large-scale Text-to-Image(T2I) models, controllable image synthesis that precisely aligns with user intentions has garnered increasing interest. This paper introduces a novel method, NullDrag, specifically crafted to achieve drag-style interactive image editing with diffusion models. Prior works deem this editing as spatial control and solve via latent modifications. However, the diffusion latents contain layout information but remain oblivious to semantic content, limiting editing capacity. Based on the discovery of high consistency between textual embeddings and the semantic space in diffusion models, we propose to manipulate text embeddings instead of latent to achieve better semantic-aware editing. Specifically, a given NULL-textual prompt is used to bind the semantic content of the image being edited through LoRA fine-tuning on diffusion UNet. Then we embed drag information into the text embeddings by optimization and instruct the following denoising steps with the refined embeddings. Further, we harness an attention control module to improve the quality of generations. Our approach surpasses state-of-the-art(SOTA) methods on the drag-style interactive editing benchmark, DragBench, with better consistency and editing quality.

Related Material

[pdf]

[bibtex]

@InProceedings{Wang_2025_ICCV, author = {Wang, Jing and Luo, Hao}, title = {Null text-guided interactive image editing for diffusion models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {1906-1915} }