AnyScene: Customized Image Synthesis with Composited Foreground

Ruidong Chen, Lanjun Wang, Weizhi Nie, Yongdong Zhang, An-An Liu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8724-8733

Abstract


Recent advancements in text-to-image technology have significantly advanced the field of image customization. Among various applications the task of customizing diverse scenes for user-specified composited elements holds great application value but has not been extensively explored. Addressing this gap we propose AnyScene a specialized framework designed to create varied scenes from composited foreground using textual prompts. AnyScene addresses the primary challenges inherent in existing methods particularly scene disharmony due to a lack of foreground semantic understanding and distortion of foreground elements. Specifically we develop a foreground injection module that guides a pre-trained diffusion model to generate cohesive scenes in visual harmony with the provided foreground. To enhance robust generation we implement a layout control strategy that prevents distortions of foreground elements. Furthermore an efficient image blending mechanism seamlessly reintegrates foreground details into the generated scenes producing outputs with overall visual harmony and precise foreground details. In addition we propose a new benchmark and a series of quantitative metrics to evaluate this proposed image customization task. Extensive experimental results demonstrate the effectiveness of AnyScene which confirms its potential in various applications.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Chen_2024_CVPR, author = {Chen, Ruidong and Wang, Lanjun and Nie, Weizhi and Zhang, Yongdong and Liu, An-An}, title = {AnyScene: Customized Image Synthesis with Composited Foreground}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8724-8733} }