MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Yang Tian, Zheng Lu, Mingqi Gao, Zheng Liu, Bo Zhao; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 488-497

Abstract


Fully comprehending scientific papers by machines reflects a high level of Artificial General Intelligence, requiring the ability to reason across fragmented and heterogeneous sources of information, presenting a complex and practically significant challenge. While Vision-Language Models (VLMs) have made remarkable strides in various tasks, particularly those involving reasoning with evidence source from single image or text page, their ability to use cross-source information for reasoning remains an open problem. This work presents MMCR, a high-difficulty benchmark designed to evaluate VLMs' capacity for reasoning with cross-source information from scientific papers. The benchmark comprises 276 high-quality questions, meticulously annotated by humans across 7 subjects and 10 task types. Experiments with 18 VLMs demonstrate that cross-source reasoning presents a substantial challenge for existing models. Notably, even the top-performing model, GPT-4o, achieved only 48.55% overall accuracy, with just 20% accuracy in multi-table comprehension tasks, while the second-best model, Qwen2.5-VL-72B, reached 39.86% overall accuracy. These results highlight the pressing need to develop VLMs capable of effectively utilizing cross-source information for reasoning.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Tian_2025_ICCV, author = {Tian, Yang and Lu, Zheng and Gao, Mingqi and Liu, Zheng and Zhao, Bo}, title = {MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {488-497} }