The Topology and Language of Relationships in the Visual Genome Dataset

David Abou Chacra, John Zelek; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4860-4868

Abstract


The Visual Genome Dataset is the de facto standard dataset used in Scene Graph generation. It contains a large collection of images with corresponding object and relationship labels. We explore the lingual aspect of the relationship predicates and find that very few symmetric/inverse relationships are represented in the dataset(for example, 'above' and 'under'). We believe this is linked to human spatial cognition, and posit that labelling bias stemming from human representations of relationships creates asymmetric relationship labels that span the whole dataset. We also perform a 2D topological analysis of the bounding boxes linked by different relationship predicates. This analysis sheds light on certain classes and their ambiguity wherein more frequent classes are semantically overloaded and therefore quite confusing. Finally we show that when reduced to more lingually and topologically well defined spatial relationships scene graph generation algorithm performance improves tremendously, but scene graph generators are still far from perfect.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Chacra_2022_CVPR, author = {Chacra, David Abou and Zelek, John}, title = {The Topology and Language of Relationships in the Visual Genome Dataset}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {4860-4868} }