Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

Cho, Jaemin; Li, Linjie; Yang, Zhengyuan; Gan, Zhe; Wang, Lijuan; Bansal, Mohit

Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 5280-5289

Abstract

Spatial control is a core capability in controllable image generation. Advancements in layout-guided image generation have shown promising results on in-distribution (ID) datasets with similar spatial configurations. However it is unclear how these models perform when facing out-of-distribution (OOD) samples with arbitrary unseen layouts. In this paper we propose LayoutBench a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number position size and shape. We benchmark two recent representative layout-guided image generation methods and observe that the good ID layout control may not generalize well to arbitrary layouts in the wild (e.g. objects at the boundary). Next we propose IterInpaint a new baseline that generates foreground and background regions step-by-step via inpainting demonstrating stronger generalizability than existing models on OOD layouts in LayoutBench. We perform quantitative and qualitative evaluation and fine-grained analysis on the four LayoutBench skills to pinpoint the weaknesses of existing models. We show comprehensive ablation studies on IterInpaint including training task ratio crop&paste vs. repaint and generation order. Lastly we evaluate the zero-shot performance of different pretrained layout-guided image generation models on LayoutBench-COCO our new benchmark for OOD layouts with real objects where our IterInpaint consistently outperforms SOTA baselines in all four splits.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Cho_2024_CVPR, author = {Cho, Jaemin and Li, Linjie and Yang, Zhengyuan and Gan, Zhe and Wang, Lijuan and Bansal, Mohit}, title = {Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {5280-5289} }