Attention-Translation-Relation Network for Scalable Scene Graph Generation

Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Petros Maragos; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0

Abstract


We find that most Scene Graph Generation approaches suffer from two limitations as they: 1) use generic attention mechanisms and dataset-specific statistics that supersede visual features and 2) treat "no interaction" as an extra, both noisy and dominant, class and prune graph edges manually or applying simple filters. As a result, such approaches do not scale up on different settings and specifications. We propose a three-stage pipeline that employs Multi-Head Attention driven by language and spatial features, Translation Embeddings and Multi-Tasking to detect an interacting pair of objects. Our attentional scheme is able to maximize the visual features' interpretability, as well as to capture the nature of datasets of different scales, while multi-tasking robustly resolves the bias of the background class. We present an experimental overview of the related literature, unveil a multitude of evaluation inconsistencies and provide quantitative and qualitative support with experiments on a variety of datasets, where our approach performs on par or even outperforms current state-of-the-art.

Related Material


[pdf]
[bibtex]
@InProceedings{Gkanatsios_2019_ICCV,
author = {Gkanatsios, Nikolaos and Pitsikalis, Vassilis and Koutras, Petros and Maragos, Petros},
title = {Attention-Translation-Relation Network for Scalable Scene Graph Generation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}
}