Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis

Emaad Khwaja, Abdullah Rashwan, Ting Chen, Oliver Wang, Suraj Kothawade, Yeqing Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 8271-8274

Abstract


We present a one-shot text-to-image diffusion model that can generate high-resolution images from natural language descriptions. Our model employs a layered U-Net architec- ture that simultaneously synthesizes images at multiple res- olution scales. We show that this method outperforms the baseline of synthesizing images only at the target resolution while reducing the computational cost per step. We demon- strate that higher resolution synthesis can be achieved by layering convolutions at additional resolution scales in contrast to other methods which require additional models for super-resolution synthesis.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Khwaja_2024_CVPR, author = {Khwaja, Emaad and Rashwan, Abdullah and Chen, Ting and Wang, Oliver and Kothawade, Suraj and Li, Yeqing}, title = {Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {8271-8274} }