Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation

Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, Yiling Xu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 18563-18574

Abstract


Text-to-3D generation has achieved remarkable progress in recent years, yet evaluating these methods remains challenging for two reasons: i) existing benchmarks lack fine-grained evaluation on different prompt categories and evaluation dimensions; ii) previous evaluation metrics only focus on a single aspect (e.g., text-3D alignment) and fail to perform multi-dimensional quality assessment. To address these problems, we first propose a comprehensive benchmark named MATE-3D. The benchmark contains eight well-designed prompt categories that cover single and multiple object generation, resulting in 1,280 generated textured meshes. We have conducted a large-scale subjective experiment from four different evaluation dimensions and collected 107,520 annotations, followed by detailed analyses of the results. Based on MATE-3D, we propose a novel quality evaluator named HyperScore. Utilizing hypernetwork to generate specified mapping functions for each evaluation dimension, our metric can effectively perform multi-dimensional quality assessment. HyperScore presents superior performance over existing metrics on MATE-3D, making it a promising metric for assessing and improving text-to-3D generation.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhang_2025_ICCV, author = {Zhang, Yujie and Cui, Bingyang and Yang, Qi and Li, Zhu and Xu, Yiling}, title = {Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {18563-18574} }