Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Cao, Weiwei; Zhang, Jianpeng; Xia, Yingda; Mok, Tony C. W.; Li, Zi; Ye, Xianghua; Lu, Le; Zheng, Jian; Tang, Yuxing; Zhang, Ling

Weiwei Cao, Jianpeng Zhang, Yingda Xia, Tony C. W. Mok, Zi Li, Xianghua Ye, Le Lu, Jian Zheng, Yuxing Tang, Ling Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 11238-11247

Abstract

Radiologists highly desire fully automated versatile AI for medical imaging interpretation. However the lack of extensively annotated large-scale multi-disease datasets has hindered the achievement of this goal. In this paper we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging. In light of the limited availability of image-report pairs we bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model. Specifically we propose a language-guided retrieval method to match each 3D CT image with its semantically closest 2D X-ray image and perform pair-wise and semantic relation knowledge distillation. Subsequently we use contrastive learning to align images and reports within the same patient while distinguishing them from the other patients. However the challenge arises when patients have similar semantic diagnoses such as healthy patients potentially confusing if treated as negatives. We introduce a robust contrastive learning that identifies and corrects these false negatives. We train our model with over 12K pairs of chest CT images and radiology reports. Extensive experiments across multiple scenarios including zero-shot learning report generation and fine-tuning processes demonstrate the model's feasibility in interpreting chest CT images.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Cao_2024_CVPR, author = {Cao, Weiwei and Zhang, Jianpeng and Xia, Yingda and Mok, Tony C. W. and Li, Zi and Ye, Xianghua and Lu, Le and Zheng, Jian and Tang, Yuxing and Zhang, Ling}, title = {Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {11238-11247} }