We are submitting four different videos as supplementary material to our article titled  "Decomposing Image Generation into Layout Prediction and Conditional Synthesis”. In these four videos, we are showing sequences of images constructed by random walks in the latent space for Progressive GAN (PGGAN) and the proposed method, which samples semantic segmentation masks and renders them with pix2pix. We use the variant of our method trained with coarse segmentation masks.

The videos with “images” in the title shows sequences of images for both methods. The videos with “segmentations” in the title shows sequences of semantic segmentations resulting from applying DeepLab method on the generated images.

We observe that the proposed method yields smoother transitions in the image sequence corresponding to random walks in the latent space. PGGAN sequence shows jittery transitions between images.

Looking at the sequence of DeepLab segmentation for both methods shows that the PGGAN dissolves and make objects appear without much physical consistency. The proposed approach however, learns to move objects around the scene when removing them from frame and replacing them with new objects. Note that physical consistency was not enforced yet the model learns to take that into account. 