Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Image Generation

Seongbeom Park, Suhong Moon, Seunghyun Park, Jinkyu Kim; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 4675-4684

Abstract


Current text-to-image generation methods produce high-resolution and high-quality images, but they should not produce immoral images that may contain inappropriate content from the perspective of commonsense morality. Conventional approaches, however, often neglect these ethical concerns, and existing solutions are often limited to ensure moral compatibility. To address this, we propose a novel method that has three main capabilities: (1) our model recognizes the degree of visual commonsense immorality of a given generated image, (2) our model localizes immoral visual (and textual) attributes that make the image visually immoral, and (3) our model manipulates such immoral visual cues into a morally-qualifying alternative. We conduct experiments with various text-to-image generation models, including the state-of-the-art Stable Diffusion model, demonstrating the efficacy of our ethical image manipulation approach. Our human study further confirms that ours is indeed able to generate morally-satisfying images from immoral ones.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Park_2024_WACV, author = {Park, Seongbeom and Moon, Suhong and Park, Seunghyun and Kim, Jinkyu}, title = {Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Image Generation}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {4675-4684} }