Decomposing Image Generation Into Layout Prediction and Conditional Synthesis

Anna Volokitin, Ender Konukoglu, Luc Van Gool; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 372-373

Abstract


Learning the distribution of multi-object scenes with Generative Adversarial Networks (GAN) is challenging. Guiding the learning using semantic intermediate representations, which are less complex than images, can be a solution. In this article, we investigate splitting the optimisation of generative adversarial networks into two parts, by first generating a semantic segmentation mask from noise and then translating that segmentation mask into an image. We performed experiments using images from the CityScapes dataset and compared our approach to Progressive Growing of GANs (PGGAN), which uses multiscale growing of networks to guide the learning. Using the lens of a segmentation algorithm to examine the structure of generated images, we find that our method achieves higher structural consistency in latent space interpolations and yields generations with better differentiation between distinct objects, while achieving the same image quality as PGGAN as judged by a user study and a standard GAN evaluation metric.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Volokitin_2020_CVPR_Workshops,
author = {Volokitin, Anna and Konukoglu, Ender and Van Gool, Luc},
title = {Decomposing Image Generation Into Layout Prediction and Conditional Synthesis},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}
}