FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16122-16131

Abstract


In recent years there has been significant progress in the development of text-to-image generative models. Evaluating the quality of the generative models is one essential step in the development process. Unfortunately the evaluation process could consume a significant amount of computational resources making the required periodic evaluation of model performance (e.g. monitoring training progress) impractical. Therefore we seek to improve the evaluation efficiency by selecting the representative subset of the text-image dataset. We systematically investigate the design choices including the selection criteria (textural features or imagebased metrics) and the selection granularity (prompt-level or set-level). We find that the insights from prior work on subset selection for training data do not generalize to this problem and we propose FlashEval an iterative search algorithm tailored to evaluation data selection. We demonstrate the effectiveness of FlashEval on ranking diffusion models with various configurations including architectures quantization levels and sampler schedules on COCO and DiffusionDB datasets. Our searched 50-item subset could achieve comparable evaluation quality to the randomly sampled 500-item subset for COCO annotations on unseen models achieving a 10x evaluation speedup. We release the condensed subset of these commonly used datasets to help facilitate diffusion algorithm design and evaluation and open-source FlashEval as a tool for condensing future datasets accessible at https://github.com/thu-nics/FlashEval.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhao_2024_CVPR, author = {Zhao, Lin and Zhao, Tianchen and Lin, Zinan and Ning, Xuefei and Dai, Guohao and Yang, Huazhong and Wang, Yu}, title = {FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {16122-16131} }