GR-Diffusion: Graph-Guided Relational-Aware Diffusion via Attention Alignment

Xiaochen Liu, Xiaoting Xi, Chao Yin, Xiaoqiang Li, Daoguo Dong; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 3759-3768

Abstract


Large-scale text-to-image diffusion models excel at generating high-fidelity images but struggle with control over complex human-object interaction (HOI), due to guidance conflicts between layout and interaction constraints. In this work, we introduce Graph-guided Relational-aware Diffusion (GR-Diffusion), a training-free framework for precise control over complex HOI in diffusion models. GR-Diffusion leverages a Target Scene Graph (TSG) as a structural scaffold to steer the internal attention at each denoising step via two plug-and-play modules. First, to control the spatial layout, the Node Alignment Guidance module guides the cross-attention maps by reducing the structural deviation between the TSG and a Dynamic Attention Graph (DAG) derived from cross-attention maps. Subsequently, to reinforce the semantic interactions, the Edge Enhancement Guidance module constructs a relational mask from the corrected cross-attention maps and injects the mask into the self-attention layers. Our GR-Diffusion achieves state-of-the-art control over both spatial layout and semantic interactions on the HICO-DET benchmark, and significantly outperforms existing baselines in both the HOI detection score and image fidelity measured by FID and KID.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Liu_2026_CVPR, author = {Liu, Xiaochen and Xi, Xiaoting and Yin, Chao and Li, Xiaoqiang and Dong, Daoguo}, title = {GR-Diffusion: Graph-Guided Relational-Aware Diffusion via Attention Alignment}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, month = {June}, year = {2026}, pages = {3759-3768} }