Image Synthesis From Layout With Locality-Aware Mask Adaption

Zejian Li, Jingyu Wu, Immanuel Koh, Yongchuan Tang, Lingyun Sun; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13819-13828


This paper is concerned with synthesizing images conditioned on a layout (a set of bounding boxes with object categories). Existing works construct a layout-mask-image pipeline. Object masks are generated separately and mapped to bounding boxes to form a whole semantic segmentation mask (layout-to-mask), with which a new image is generated (mask-to-image). However, overlapped boxes in layouts result in overlapped object masks, which reduces the mask clarity and causes confusion in image generation. We hypothesize the importance of generating clean and semantically clear semantic masks. The hypothesis is supported by the finding that the performance of state-of-the-art LostGAN decreases when input masks are tainted. Motivated by this hypothesis, we propose Locality-Aware Mask Adaption (LAMA) module to adapt overlapped or nearby object masks in the generation. Experimental results show our proposed model with LAMA outperforms existing approaches regarding visual fidelity and alignment with input layouts. On COCO-stuff in 256x256, our method improves the state-of-the-art FID score from 41.65 to 31.12 and the SceneFID from 22.00 to 18.64.

Related Material

[pdf] [supp]
@InProceedings{Li_2021_ICCV, author = {Li, Zejian and Wu, Jingyu and Koh, Immanuel and Tang, Yongchuan and Sun, Lingyun}, title = {Image Synthesis From Layout With Locality-Aware Mask Adaption}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {13819-13828} }