PSG-Adapter: Controllable Planning Scene Graph for Improving Text-to-Image Diffusion

Yi Gao; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 2371-2387

Abstract


Significant progress in text-to-image generation has been driven by the application of diffusion models, highlighting their crucial role and exceptional impact on this field. However, diffusion models often fall short of comprehending spatial relationships within text. This limitation primarily stems from their challenge in constructing logical spatial relationships, such as distinguishing between foreground and background elements. Additionally, their limited text encoding capacity exacerbates inconsistencies in the generated images derived from textual prompts. In this paper, we introduce the Planning Scene Graph Adapter (PSG-Adapter). Our approach employs Planning Scene Graph (PSG) method to decompose the original text prompt into distinct sub-prompts containing spatial relationships. By leveraging the proposed Planning Scene Graph ControlNet (PSG-ControlNet), additional spatial information is infused into original text embeddings. By fully exploiting the implicit spatial relationships within the text, our method achieves fine-grained control over the composition of the generated images. This enhancement is particularly notable in scenarios involving the generation of multiple objects and complex spatial relationships. Extensive experiments have been conducted to verify the efficacy of PSG-Adapter in generating spatially coherent images and complex scenes with multiple objects and relationships.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Gao_2024_ACCV, author = {Gao, Yi}, title = {PSG-Adapter: Controllable Planning Scene Graph for Improving Text-to-Image Diffusion}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {2371-2387} }