Prompt Augmentation for Self-supervised Text-guided Image Manipulation

Rumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8829-8838

Abstract


Text-guided image editing finds applications in various creative and practical fields. While recent studies in image generation have advanced the field they often struggle with the dual challenges of coherent image transformation and context preservation. In response our work introduces prompt augmentation a method amplifying a single input prompt into several target prompts strengthening textual context and enabling localised image editing. Specifically we use the augmented prompts to delineate the intended manipulation area. We propose a Contrastive Loss tailored to driving effective image editing by displacing edited areas and drawing preserved regions closer. Acknowledging the continuous nature of image manipulations we further refine our approach by incorporating the similarity concept creating a Soft Contrastive Loss. The new losses are incorporated to the diffusion model demonstrating improved or competitive image editing results on public datasets and generated images over state-of-the-art approaches.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Bodur_2024_CVPR, author = {Bodur, Rumeysa and Bhattarai, Binod and Kim, Tae-Kyun}, title = {Prompt Augmentation for Self-supervised Text-guided Image Manipulation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8829-8838} }