MixSyn: Compositional Image Synthesis with Fuzzy Masks and Style Fusion

Ilke Demir, Umur Aybars Ciftci; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7460-7469

Abstract


Synthetic images created by generative models increase in quality and expressiveness as newer models utilize larger datasets and novel architectures. Although this photorealism is a benefit from a creative standpoint expressiveness is still limited by the training data. Most of these approaches are built on the transfer between source and target pairs or they generate completely new samples based on an ideal distribution still resembling the closest real sample while missing less frequent or non-existent compositions. We propose MixSyn (read as "mixin' ") to learn novel fuzzy compositions from multiple sources and to create novel images as a mix of image regions corresponding to the compositions. MixSyn not only combines uncorrelated regions from multiple source masks into a coherent semantic composition but also generates mask-aware high quality reconstructions of non-existing images. We compare MixSyn to state-of-the-art single-source sequential generation and collage generation approaches in terms of quality diversity realism and expressive power; comparing region-wise reconstruction and similarity scores. We also showcase interactive synthesis mix & match design space exploration and edit propagation tasks with no mask dependency.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Demir_2024_CVPR, author = {Demir, Ilke and Ciftci, Umur Aybars}, title = {MixSyn: Compositional Image Synthesis with Fuzzy Masks and Style Fusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7460-7469} }