LUSD: Localized Update Score Distillation for Text-Guided Image Editing

Chinchuthakun, Worameth; Saengja, Tossaporn; Tritrong, Nontawat; Rewatbowornwong, Pitchaporn; Khungurn, Pramook; Suwajanakorn, Supasorn

Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong, Pitchaporn Rewatbowornwong, Pramook Khungurn, Supasorn Suwajanakorn; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 15298-15307

Abstract

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Chinchuthakun_2025_ICCV, author = {Chinchuthakun, Worameth and Saengja, Tossaporn and Tritrong, Nontawat and Rewatbowornwong, Pitchaporn and Khungurn, Pramook and Suwajanakorn, Supasorn}, title = {LUSD: Localized Update Score Distillation for Text-Guided Image Editing}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {15298-15307} }