Towards an Exhaustive Evaluation of Vision-Language Foundation Models

Salin, Emmanuelle; Ayache, Stéphane; Favre, Benoit

Emmanuelle Salin, Stéphane Ayache, Benoit Favre; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 339-352

Abstract

Vision-language foundation models have had considerable increase in performances in the last few years. However, there is still a lack of comprehensive evaluation methods able to clearly explain their performances. We argue that a more systematic approach to foundation model evaluation would be beneficial to their use in real-world applications. In particular, we think that those models should be evaluated on a broad range of specific capabilities, in order to bring awareness to the width of their scope and their potential weaknesses. To that end, we propose a methodology to build a taxonomy of multimodal capabilities for vision-language foundation models. The proposed taxonomy is intended as a first step towards an exhaustive evaluation of vision-language foundation models.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Salin_2023_ICCV, author = {Salin, Emmanuelle and Ayache, St\'ephane and Favre, Benoit}, title = {Towards an Exhaustive Evaluation of Vision-Language Foundation Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {339-352} }