Relation Rectification in Diffusion Model

Yinwei Wu, Xingyi Yang, Xinchao Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 7685-7694

Abstract


Despite their exceptional generative abilities large T2I diffusion models much like skilled but careless artists often struggle with accurately depicting visual relationships between objects. This issue as we uncover through careful analysis arises from a misaligned text encoder that struggles to interpret specific relationships and differentiate the logical order of associated objects. To resolve this we introduce a novel task termed Relation Rectification aiming to refine the model to accurately represent a given relationship it initially fails to generate. To address this we propose an innovative solution utilizing a Heterogeneous Graph Convolutional Network (HGCN). It models the directional relationships between relation terms and corresponding objects within the input prompts. Specifically we optimize the HGCN on a pair of prompts with identical relational words but reversed object orders supplemented by a few reference images. The lightweight HGCN adjusts the text embeddings generated by the text encoder ensuring accurate reflection of the textual relation in the embedding space. Crucially our method retains the parameters of the text encoder and diffusion model preserving the model's robust performance on unrelated descriptions. We validated our approach on a newly curated dataset of diverse relational data demonstrating both quantitative and qualitative enhancements in generating images with precise visual relations. Project page: https://wuyinwei-hah.github.io/rrnet.github.io/ .

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wu_2024_CVPR, author = {Wu, Yinwei and Yang, Xingyi and Wang, Xinchao}, title = {Relation Rectification in Diffusion Model}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {7685-7694} }