Prototype-Based Embedding Network for Scene Graph Generation

Zheng, Chaofan; Lyu, Xinyu; Gao, Lianli; Dai, Bo; Song, Jingkuan

Chaofan Zheng, Xinyu Lyu, Lianli Gao, Bo Dai, Jingkuan Song; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 22783-22792

Abstract

Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. However, due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category, e.g., "man-eating-pizza, giraffe-eating-leaf", and the severe inter-class similarity between different classes, e.g., "man-holding-plate, man-eating-pizza", in model's latent space. The above challenges prevent current SGG methods from acquiring robust features for reliable relation prediction. In this paper, we claim that predicate's categoryinherent semantics can serve as class-wise prototypes in the semantic space for relieving the above challenges caused by the diverse visual appearances. To the end, we propose the Prototype-based Embedding Network (PE-Net), which models entities/predicates with prototype-aligned compact and distinctive representations and establishes matching between entity pairs and predicates in a common embedding space for relation recognition. Moreover, Prototypeguided Learning (PL) is introduced to help PE-Net efficiently learn such entity-predicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching caused by the predicate's semantic overlap. Extensive experiments demonstrate that our method gains superior relation recognition capability on SGG, achieving new state-of-the-art performances on both Visual Genome and Open Images datasets.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Zheng_2023_CVPR, author = {Zheng, Chaofan and Lyu, Xinyu and Gao, Lianli and Dai, Bo and Song, Jingkuan}, title = {Prototype-Based Embedding Network for Scene Graph Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {22783-22792} }