Salient Object-Aware Background Generation using Text-Guided Diffusion Models

Amir Erfan Eshratifar, Joao V.B. Soares, Kapil Thadani, Shaunak Mishra, Mikhail Kuznetsov, Yueh-Ning Ku, Paloma De Juan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7489-7499


Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting where the goal is to extend image content beyond a salient object's boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently when used for background creation inpainting models frequently extend the salient object's boundaries and thereby change the object's identity which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting our proposed approach reduces object expansion by 3.6x on average with no degradation in standard visual metrics across multiple datasets. We will also release our code and model checkpoints for reproducibility.

Related Material

[pdf] [arXiv]
@InProceedings{Eshratifar_2024_CVPR, author = {Eshratifar, Amir Erfan and Soares, Joao V.B. and Thadani, Kapil and Mishra, Shaunak and Kuznetsov, Mikhail and Ku, Yueh-Ning and De Juan, Paloma}, title = {Salient Object-Aware Background Generation using Text-Guided Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7489-7499} }