MerCulture: A Comprehensive Benchmark to Evaluate Vision-Language Models on Cultural Understanding in Singapore

Tushar, Pranav; Pandey, Eshan; Austria, Lyka Diane Bala; Loo, Yin Yin; Lim, Jing Hao; Atmosukarto, Indriyati; Lock, Donny Soh Cheng

Pranav Tushar, Eshan Pandey, Lyka Diane Bala Austria, Yin Yin Loo, Jing Hao Lim, Indriyati Atmosukarto, Donny Soh Cheng Lock; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 565-574

Abstract

Vision Language Models (VLMs) have achieved remarkable performance across multimodal tasks. However, they continue to exhibit significant limitations in cultural understanding, particularly when interpreting non Western imagery. These limitations primarily stem from biases in training data, which predominantly reflect Western centric perspectives. Existing benchmarks fail to address this issue, lacking diversity in cultural representation and evaluation criteria. To bridge this gap, we introduce MerCulture, a multimodal benchmark designed to assess VLMs' ability to interpret culturally significant objects, traditions, and symbols. MerCulture consists of two core tasks: MerCulture VQA, which evaluates culturally grounded question answering, and MerCulture Visual Grounding, which measures object context associations. To ensure rigorous evaluation, we propose novel metrics tailored for cultural fidelity and bias measurement. Specifically, for the MerCulture VQA task, we introduce the Cultural Alignment Score (CAS) and Bias Reinforcement Rate (BRR). For MerCulture Visual Grounding, we define the Mean Cultural Grounding Score (MCGS) and Textual Alignment Score (TAS). Benchmarking state-of-the-art VLMs on MerCulture reveals substantial performance disparities, underscoring the urgent need for more culturally inclusive multimodal AI systems. Our findings establish a foundation for advancing cross cultural AI applications in domains such as education, heritage preservation, and multilingual systems.

Related Material

[pdf]

[bibtex]

@InProceedings{Tushar_2025_CVPR, author = {Tushar, Pranav and Pandey, Eshan and Austria, Lyka Diane Bala and Loo, Yin Yin and Lim, Jing Hao and Atmosukarto, Indriyati and Lock, Donny Soh Cheng}, title = {MerCulture: A Comprehensive Benchmark to Evaluate Vision-Language Models on Cultural Understanding in Singapore}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {565-574} }