-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Gao_2025_ICCV, author = {Gao, Yuan and Kim, Sangwook and You, Jianzhong and McIntosh, Chris}, title = {ProbMED: A Probabilistic Framework for Medical Multimodal Binding}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {20157-20167} }
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
Abstract
Medical decision-making requires integrating diverse medical information, from imaging to clinical narratives. These medical modalities are often acquired in a many-to-many manner. However, current medical vision-language pretraining models (Med-VLPMs) fail to directly account for this many-to-many mapping in their model training and embeddings. To address this, we present Probabilistic Modality-Enhanced Diagnosis (ProbMED), a multimodal Med-VLPM that employs probabilistic contrastive learning to model distributions over embeddings rather than deterministic estimates. ProbMED aligns four distinct modalities--chest X-rays, electrocardiograms, echocardiograms, and clinical text--into a unified probabilistic embedding space. We use InfoNCE loss with Hellinger distance to integrate inter-modality distributions. We introduce a probabilistic synthetic sampling loss that captures modality-specific mean and variance to improve intra-modality binding. Extensive experiments across 13 medical datasets demonstrate that our model outperforms current Med-VLPMs in cross-modality retrieval, zero-shot, and few-shot classification. We also demonstrate the robust integration of multiple modalities for prognostication, showing improved intra- and inter-medical modality binding.
Related Material
