Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

Zhang, Zicheng; Kou, Tengchuan; Wang, Shushi; Li, Chunyi; Sun, Wei; Wang, Wei; Li, Xiaoyu; Wang, Zongyu; Cao, Xuezhi; Min, Xiongkuo; Liu, Xiaohong; Zhai, Guangtao

Zicheng Zhang, Tengchuan Kou, Shushi Wang, Chunyi Li, Wei Sun, Wei Wang, Xiaoyu Li, Zongyu Wang, Xuezhi Cao, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 10621-10631

Abstract

Evaluating text-to-vision content hinges on two crucial aspects: **visual quality** and **alignment**. While significant progress has been made in developing objective models to assess these dimensions, the performance of such models heavily relies on the scale and quality of human annotations. According to **Scaling Law**, increasing the number of human-labeled instances follows a predictable pattern that enhances the performance of evaluation models.Therefore, we introduce a comprehensive dataset designed to **E**valuate **V**isual quality and **A**lignment **L**evel for text-to-vision content (**Q-EVAL-100K**), featuring the largest collection of human-labeled Mean Opinion Scores (MOS) for the mentioned two aspects.The **Q-EVAL-100K** dataset encompasses both text-to-image and text-to-video models, with 960K human annotations specifically focused on visual quality and alignment for 100K instances (60K images and 40K videos). Leveraging this dataset with context prompt, we propose **Q-Eval-Score**, a unified model capable of evaluating both visual quality and alignment with special improvements for handling long-text prompt alignment.Experimental results indicate that the proposed **Q-Eval-Score** achieves superior performance on both visual quality and alignment, with strong generalization capabilities across other benchmarks. These findings highlight the significant value of the **Q-EVAL-100K** dataset. **The data and code will be released** to help promote the generation models.

Related Material

[pdf]

[bibtex]

@InProceedings{Zhang_2025_CVPR, author = {Zhang, Zicheng and Kou, Tengchuan and Wang, Shushi and Li, Chunyi and Sun, Wei and Wang, Wei and Li, Xiaoyu and Wang, Zongyu and Cao, Xuezhi and Min, Xiongkuo and Liu, Xiaohong and Zhai, Guangtao}, title = {Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {10621-10631} }