- [pdf] [supp] [code]
Continuous Self-Study: Scene Graph Generation with Self-Knowledge Distillation and Spatial Augmentation
As an extension of visual detection tasks, scene graph generation (SGG) has drawn increasing attention with the achievement of complex image understanding. However, it still faces two challenges: one is the distinguishing of objects with high visual similarity, the other is the discriminating of relationships with long-tailed bias. In this paper, we propose a Continuous Self-Study model (CSS) with self-knowledge distillation and spatial augmentation to refine the detection of hard samples. We design a long-term memory structure for CSS to learn its own behavior with the context feature, which can perceive the hard sample of itself and focus more on similar targets in different scenes. Meanwhile, a fine-grained relative position encoding method is adopted to augment spatial features and supplement relationship information. On the Visual Genome benchmark, experiments show that the proposed CSS achieves obvious improvements over the previous state-of-the-art methods. Our code is available at https://github.com/LINYE1998/Continuous_Self_Study.