Hierarchical Graph Attention Network for Visual Relationship Detection

Li Mi, Zhenzhong Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13886-13895

Abstract


Visual Relationship Detection (VRD) aims to describe the relationship between two objects by providing a structural triplet shown as . Existing graph-based methods mainly represent the relationships by an object-level graph, which ignores to model the triplet-level dependencies. In this work, a Hierarchical Graph Attention Network (HGAT) is proposed to capture the dependencies on both object-level and triplet-level. Object-level graph aims to capture the interactions between objects, while the triplet-level graph models the dependencies among relation triplets. In addition, prior knowledge and attention mechanism are introduced to fix the redundant or missing edges on graphs that are constructed according to spatial correlation. With these approaches, nodes are allowed to attend over their spatial and semantic neighborhoods' features based on the visual or semantic feature correlation. Experimental results on the well-known VG and VRD datasets demonstrate that our model significantly outperforms the state-of-the-art methods.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Mi_2020_CVPR,
author = {Mi, Li and Chen, Zhenzhong},
title = {Hierarchical Graph Attention Network for Visual Relationship Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}