Bridged Variational Autoencoders for Joint Modeling of Images and Attributes

Ravindra Yadav, Ashish Sardana, Vinay Namboodiri, Rajesh M Hegde; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1479-1487

Abstract


Generative models have recently shown the ability to realistically generate data and model the distribution accurately. However, joint modeling of an image with the attribute that it is labeled with requires learning a cross modal correspondence between images and the attribute data. Though the information present in the images and attributes possess completely different statistical properties altogether, there exists an inherent correspondence that is challenging to capture. Various models have aimed at capturing this correspondence either through joint modeling of a variational autoencoder or through separate encoder networks that are then concatenated. We present an alternative by proposing a bridged variational autoencoder that allows for learning cross-modal correspondence by incorporating cross-modal hallucination losses in the latent space. In comparison to the existing methods, we have found that by incorporating this information into the network we not only obtain better generation results, but also obtain very distinctive latent embeddings thereby increasing the accuracy of cross-modal generated results. We validate the proposed method through comparison with state of the art methods and benchmarking on standard datasets.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Yadav_2020_WACV,
author = {Yadav, Ravindra and Sardana, Ashish and Namboodiri, Vinay and Hegde, Rajesh M},
title = {Bridged Variational Autoencoders for Joint Modeling of Images and Attributes},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}