Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, Shih-Fu Chang; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1043-1052

Abstract


We propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training. SP-AEN aims to tackle the inherent problem — semantic loss — in the prevailing family of embedding-based ZSL, where some semantics would be discarded during training if they are non-discriminative for training classes, but could become critical for recognizing test classes. Specifically, SP-AEN prevents the semantic loss by introducing an independent visual-to-semantic space embedder which disentangles the semantic space into two subspaces for the two arguably conflicting objectives: classification and reconstruction. Through adversarial learning of the two subspaces, SP-AEN can transfer the semantics from the reconstructive subspace to the discriminative one, accomplishing the improved zero-shot recognition of unseen classes. Comparing with prior works, SP-AEN can not only improve classification but also generate photo-realistic images, demonstrating the effectiveness of semantic preservation. On four popular benchmarks: CUB, AWA, SUN and aPY, SP-AEN considerably outperforms other state-of-the-art methods by an absolute performance difference of 12.2%, 9.3%, 4.0%, and 3.6% in terms of harmonic mean values.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Chen_2018_CVPR,
author = {Chen, Long and Zhang, Hanwang and Xiao, Jun and Liu, Wei and Chang, Shih-Fu},
title = {Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}