-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Li_2024_CVPR, author = {Li, Shanglin and Zeng, Bohan and Feng, Yutang and Gao, Sicheng and Liu, Xiuhui and Liu, Jiaming and Li, Lin and Tang, Xu and Hu, Yao and Liu, Jianzhuang and Zhang, Baochang}, title = {ZONE: Zero-Shot Instruction-Guided Local Editing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {6254-6263} }
ZONE: Zero-Shot Instruction-Guided Local Editing
Abstract
Recent advances in vision-language models like Stable Diffusion have shown remarkable power in creative image synthesis and editing.However most existing text-to-image editing methods encounter two obstacles: First the text prompt needs to be carefully crafted to achieve good results which is not intuitive or user-friendly. Second they are insensitive to local edits and can irreversibly affect non-edited regions leaving obvious editing traces. To tackle these problems we propose a Zero-shot instructiON-guided local image Editing approach termed ZONE. We first convert the editing intent from the user-provided instruction (e.g. "make his tie blue") into specific image editing regions through InstructPix2Pix. We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model. We further develop an edge smoother based on FFT for seamless blending between the layer and the image.Our method allows for arbitrary manipulation of a specific region with a single instruction while preserving the rest. Extensive experiments demonstrate that our ZONE achieves remarkable local editing results and user-friendliness outperforming state-of-the-art methods. Code is available at https://github.com/lsl001006/ZONE.
Related Material