-
[pdf]
[bibtex]@InProceedings{Corneanu_2025_WACV, author = {Corneanu, Ciprian A. and Feng, Qianli and Martinez, Aleix M.}, title = {Structured Human Assessment of Text-to-Image Generative Models}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {4481-4490} }
Structured Human Assessment of Text-to-Image Generative Models
Abstract
Following the great progress in text-conditioned image generation there is a dire need for establishing clear comparison benchmarks. Unfortunately assessing performance of such models is highly subjective and notoriously difficult. Current automatic assessment of generated images quality and their alignment to text are approximate at best while human assessment is subjective poorly calibrated and not very well defined. To address these concerns we propose GenomeBench a new framework for assessing quality of text-to-image generative models. It consists of a prompt dataset richly annotated with semantic components based on a formalized grounding of language and images. On top of it we define a procedure to collect human assessment through a carefully guided question answering process. Finally these assessments are summarized into a novel score built around quality and alignment to text. We show the proposal achieves higher inter-annotator agreement with respect to the baseline human assessment and better correlation between quality and alignment compared to automatic assessment. Finally we use this framework to dissect the performance of recent text-to-image models providing insights on strength and weakness of each.
Related Material