ChartR: Evaluating Reasoning Accuracy and Robustness in Chart Question Answering

Xiaojun Chen, Sixiao Luo, Ziqi Liu, Min Yang, Qin Zhang, Liang-Jie Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 41193-41202

Abstract


Chart Question Answering (CQA) benchmarks are critical for evaluating Multimodal Large Language Models (MLLMs) on visual data reasoning. Existing benchmarks focus mainly on final-answer correctness, ignoring intermediate reasoning steps and the propagation of errors in multi-step processes. To address this, we introduce ChartR, a benchmark designed to assess both the accuracy and robustness of reasoning in chart-understanding tasks. Each question is decomposed into 4-10 sub-questions covering key reasoning types, and each chart includes four visually perturbed variants (blurred, noise-added, watermark-added, annotation-removed) to systematically evaluate robustness. ChartR contains 200 base charts, 800 variants, 1,652 questions, and 8,260 image-question pairs. We further propose a comprehensive evaluation framework with eight metrics that evaluate reasoning-chain accuracy, robustness under visual perturbations, and enable analysis of potential error propagation patterns. Experiments on twelve MLLMs, including general-purpose and chart-specialized models, reveal low reasoning reliability, early-step errors that may propagate, value extraction as the primary bottleneck, and sharp performance drops under perturbations, highlighting reliance on textual cues over true visual understanding.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Chen_2026_CVPR, author = {Chen, Xiaojun and Luo, Sixiao and Liu, Ziqi and Yang, Min and Zhang, Qin and Zhang, Liang-Jie}, title = {ChartR: Evaluating Reasoning Accuracy and Robustness in Chart Question Answering}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {41193-41202} }