OraPO: Oracle-educated Reinforcement Learning for Data-efficient and Factual Radiology Report Generation

Chen, Zhuoxiao; Yu, Hongyang; Xu, Ying; Luo, Yadan; Duong, Long; Li, Yuan-Fang

Zhuoxiao Chen, Hongyang Yu, Ying Xu, Yadan Luo, Long Duong, Yuan-Fang Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 28275-28287

Abstract

Radiology report generation (RRG) aims to automatically produce clinically faithful reports from chest X-ray images. Prevailing work typically follows a scale-driven paradigm, by multi-stage training over large paired corpora and oversized backbones, making pipelines highly data- and compute-intensive. In this paper, we propose Oracle-educated GRPO (OraPO) with a FactScore-based reward (FactS) to tackle the RRG task under constrained budgets. OraPO enables single-stage, RL-only training by converting failed GRPO explorations on rare or difficult studies into direct preference supervision via a lightweight oracle step. FactS grounds learning in diagnostic evidence by extracting atomic clinical facts and checking entailment against ground-truth labels, yielding dense, interpretable sentence-level rewards. Together, OraPO and FactS create a compact and powerful framework that significantly improves learning efficiency on clinically challenging cases, setting the new SOTA performance on the CheXpert Plus dataset (0.341 in F1) with 2-3 orders of magnitude less training data using a small base VLM on modest hardware.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Chen_2026_CVPR, author = {Chen, Zhuoxiao and Yu, Hongyang and Xu, Ying and Luo, Yadan and Duong, Long and Li, Yuan-Fang}, title = {OraPO: Oracle-educated Reinforcement Learning for Data-efficient and Factual Radiology Report Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {28275-28287} }