Deep Modality Invariant Adversarial Network for Shared Representation Learning

Kuniaki Saito, Yusuke Mukuta, Yoshitaka Ushiku, Tatsuya Harada; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2623-2629

Abstract


In this work, we propose a novel method to learn the mapping to the common space wherein different modalities have the same information for shared representation learning. Our goal is to correctly classify the unseen target modality with a classifier trained on source modality samples and their labels in common representations. We call these representations modality-invariant representations. Our proposed method has the major advantage of not needing any labels for the target samples in order to learn representations. For example, we obtain modality-invariant representations from pairs of images and texts. Then, we train the text classifier on the modality-invariant space. Although we do not give any explicit relationship between images and labels, we can expect that images can be classified correctly in that space. Our method draws upon the theory of domain adaptation and we propose to use adversarial training for our purpose.

Related Material


[pdf]
[bibtex]
@InProceedings{Saito_2017_ICCV,
author = {Saito, Kuniaki and Mukuta, Yusuke and Ushiku, Yoshitaka and Harada, Tatsuya},
title = {Deep Modality Invariant Adversarial Network for Shared Representation Learning},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}