GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition

Shih-Cheng Huang, Liyue Shen, Matthew P. Lungren, Serena Yeung; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3942-3951

Abstract


In recent years, the growing number of medical imaging studies is placing an ever-increasing burden on radiologists. Deep learning provides a promising solution for automatic medical image analysis and clinical decision support. However, large-scale manually labeled datasets required for training deep neural networks are difficult and expensive to obtain for medical images. The purpose of this work is to develop label-efficient multimodal medical imaging representations by leveraging radiology reports. Specifically, we propose an attention-based framework (GLoRIA) for learning global and local representations by contrasting image sub-regions and words in the paired report. In addition, we propose methods to leverage the learned representations for various downstream medical image recognition tasks with limited labels. Our results demonstrate high-performance and label-efficiency for image-text retrieval, classification (finetuning and zeros-shot settings), and segmentation on different datasets.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Huang_2021_ICCV, author = {Huang, Shih-Cheng and Shen, Liyue and Lungren, Matthew P. and Yeung, Serena}, title = {GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {3942-3951} }