Taxonomy-Aware Evaluation of Vision-Language Models

Snæbjarnarson, Vésteinn; Du, Kevin; Stoehr, Niklas; Belongie, Serge; Cotterell, Ryan; Lang, Nico; Frank, Stella

Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr, Serge Belongie, Ryan Cotterell, Nico Lang, Stella Frank; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 9109-9120

Abstract

When a vision-language model (VLM) is prompted to identify an entity depicted in an image, it may answer "I see a conifer," rather than the specific label "Norway spruce". This raises two issues for evaluation: Firstly, the unconstrained generated text needs to be mapped to the evaluation label space (i.e., "conifer"). Secondly, a useful classification measure should give partial credit to less specific, but not incorrect, answers ("Norway spruce" being a type of "conifer"). To meet these requirements, we propose a framework for evaluating unconstrained text predictions such as those generated from a vision-language model against a taxonomy. Specifically, we propose the use of hierarchical precision and recall measures to assess the level of correctness and specificity of predictions with regard to a taxonomy. Experimentally, we first show that existing text similarity measures do not capture taxonomic similarity well. We then develop and compare different methods to map textual VLM predictions onto a taxonomy. This allows us to compute hierarchical similarity measures between the generated text and the ground truth labels. Finally, we analyze modern VLMs on fine-grained visual classification tasks based on our proposed taxonomic evaluation scheme.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Snaebjarnarson_2025_CVPR, author = {Sn{\ae}bjarnarson, V\'esteinn and Du, Kevin and Stoehr, Niklas and Belongie, Serge and Cotterell, Ryan and Lang, Nico and Frank, Stella}, title = {Taxonomy-Aware Evaluation of Vision-Language Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {9109-9120} }