-
[pdf]
[bibtex]@InProceedings{Ahsan_2025_CVPR, author = {Ahsan, Syed Bilal and Ikhalas, Muhammad and Khan, Muhammad Muzamil and Ullah, Sana and Zaheer, Muhammad Zaigham}, title = {ARDGen: Augmentation Regularization for Domain-Generalized Medical Report Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {6535-6544} }
ARDGen: Augmentation Regularization for Domain-Generalized Medical Report Generation
Abstract
Automated medical report generation from chest radiographs is pivotal for clinical decision support, yet existing systems suffer from performance degradation due to domain shifts across diverse imaging sources. In this work, we propose a multi-modal framework that robustly generates clinically relevant diagnostic reports by integrating visual and textual modalities. Our model comprises an image classification branch employing a pre-trained ResNet-based encoder with advanced image augmentation and consistency regularization and a report generation branch featuring dual BERT-based decoders. The primary text decoder produces the diagnostic narrative while an Augmentation Regularization Decoder (ARD), used exclusively during training, serves as a regularizer to enhance the model's adaptability. We further enforce text-level consistency through augmentation-driven losses. Extensive experiments conducted on the MIMIC-CXR and IU-Xray datasets demonstrate that our approach significantly outperforms existing methods, achieving superior generalization and improved report quality on unseen data. This framework offers a scalable and robust solution for reliable automated diagnosis, bridging the gap between visual evidence and accurate clinical narratives.
Related Material