Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation

Gengcong Yang, Jingyi Zhang, Yong Zhang, Baoyuan Wu, Yujiu Yang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12527-12536


To generate "accurate" scene graphs, almost all exist-ing methods predict pairwise relationships in a determin-istic manner. However, we argue that visual relationshipsare often semantically ambiguous. Specifically, inspired bylinguistic knowledge, we classify the ambiguity into threetypes: Synonymy Ambiguity, Hyponymy Ambiguity, andMulti-view Ambiguity. The ambiguity naturally leads to theissue ofimplicit multi-label, motivating the need for diversepredictions. In this work, we propose a novel plug-and-play Probabilistic Uncertainty Modeling (PUM) module. Itmodels each union region as a Gaussian distribution, whosevariance measures the uncertainty of the corresponding vi-sual content. Compared to the conventional determinis-tic methods, such uncertainty modeling brings stochasticityof feature representation, which naturally enables diversepredictions. As a byproduct, PUM also manages to covermore fine-grained relationships and thus alleviates the is-sue of bias towards frequent relationships. Extensive exper-iments on the large-scale Visual Genome benchmark showthat combining PUM with newly proposed ResCAGCN canachieve state-of-the-art performances, especially under themean recall metric. Furthermore, we show the universal ef-fectiveness of PUM by plugging it into some existing modelsand provide insightful analysis of its ability to generate di-verse yet plausible visual relationships.

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Yang_2021_CVPR, author = {Yang, Gengcong and Zhang, Jingyi and Zhang, Yong and Wu, Baoyuan and Yang, Yujiu}, title = {Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {12527-12536} }