HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

Bakr, Eslam Mohamed; Sun, Pengzhan; Shen, Xiaoqian; Khan, Faizan Farooq; Li, Li Erran; Elhoseiny, Mohamed

Eslam Mohamed Bakr, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 20041-20053

Abstract

Designing robust text-to-image (T2I) models have been extensively explored in recent years, especially with the emergence of diffusion models, which achieves state-of-the-art results on T2I synthesis tasks. Despite the significant effort and success in this direction, we observed that the existing metrics need to be more robust to measure real progress. Therefore, comparing the existing models are more complex and heavily subjective for human evaluations. In addition, we observe that the efforts in developing new architectures do not coincide with efforts in the evaluation direction. Driven by this observation, the importance of designing a concrete evaluation emerges to fill the gap between designing and evaluation efforts. Accordingly, we introduce our holistic, reliable, and scalable benchmark, termed \papernameAbbrev , for T2I models. Unlike the existing benchmarks, which focus on limited aspects, we measure 13 skills, which could be categorized into five critical skills; accuracy, robustness, generalization, fairness, and bias. In addition, \papernameAbbrev covers 50 applications, e.g., fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. We study 13 skills, e.g., robustness, fairness, and bias. To probe the effectiveness of our \papernameAbbrev , a human evaluation is conducted, which is aligned with 95% with our evaluations on average across the 13 skills. We hope our findings, e.g., all the existing models can not generate visual text nor emotionally grounded images, help accelerate and direct future research. To this end, the code and data are available at https://eslambakr.github.io/hrsbench.github.io/.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Bakr_2023_ICCV, author = {Bakr, Eslam Mohamed and Sun, Pengzhan and Shen, Xiaoqian and Khan, Faizan Farooq and Li, Li Erran and Elhoseiny, Mohamed}, title = {HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {20041-20053} }