Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Wang, Su; Saharia, Chitwan; Montgomery, Ceslee; Pont-Tuset, Jordi; Noy, Shai; Pellegrini, Stefano; Onoe, Yasumasa; Laszlo, Sarah; Fleet, David J.; Soricut, Radu; Baldridge, Jason; Norouzi, Mohammad; Anderson, Peter; Chan, William

Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 18359-18369

Abstract

Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to the input text prompt, while consistent with the input image. We present Imagen Editor, a cascaded diffusion model, built by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by incorporating object detectors for proposing inpainting masks during training. In addition, text-guided image inpainting captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Wang_2023_CVPR, author = {Wang, Su and Saharia, Chitwan and Montgomery, Ceslee and Pont-Tuset, Jordi and Noy, Shai and Pellegrini, Stefano and Onoe, Yasumasa and Laszlo, Sarah and Fleet, David J. and Soricut, Radu and Baldridge, Jason and Norouzi, Mohammad and Anderson, Peter and Chan, William}, title = {Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {18359-18369} }