Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features

Xu Yang, Hanwang Zhang, Jianfei Cai; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 36-52


Due to fact that it is prohibitively expensive to completely annotate visual relationships, ie, the (obj1, rel, obj2) triplets, relationship models are inevitably biased to object classes of limited pairwise patterns, leading to poor generalization to rare or unseen object combinations. Therefore, we are interested in learning object-agnostic visual features for more generalizable relationship models. By ``agnostic'', we mean that the feature is less likely biased to the classes of paired objects. To alleviate the bias, we propose a novel Shuffle-Then-Assemble pre-training strategy. First, we discard all the triplet relationship annotations in an image, leaving two unpaired object domains without obj1-obj2 alignment. Then, our feature learning is to recover possible obj1-obj2 pairs. In particular, we design a cycle of residual transformations between the two domains, where the identity mappings encourage the RoI features to capture shared but not object-specific visual patterns. Extensive experiments on two visual relationship benchmarks show that by using our pre-trained features, naive relationship models can be consistently improved and even outperform other state-of-the-art relationship models.

Related Material

author = {Yang, Xu and Zhang, Hanwang and Cai, Jianfei},
title = {Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}