-
[pdf]
[supp]
[bibtex]@InProceedings{Choi_2025_ICCV, author = {Choi, Seo-Yeon and Lee, Kyungsu}, title = {Patient-Centric Statistical Multi-Modal Fusion for Medical Diagnosis: Integrating DICOM, Radiomics, and Patient Attributes}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2273-2284} }
Patient-Centric Statistical Multi-Modal Fusion for Medical Diagnosis: Integrating DICOM, Radiomics, and Patient Attributes
Abstract
Deep learning (DL) has led to substantial progress in medical image analysis, particularly for disease classification. However, the integration of patient-specific attributes, such as age, body mass index (BMI), and lifestyle factors with radiomics and raw imaging data remains a key challenge in the development of personalized diagnostic models. To alleviate this, in this research, we propose a novel multi-modal framework, denoted as Statistically Coherent Network (SCN), which jointly models imaging, radiomic, and patient metadata through a structured multi-space latent representation. SCN facilitates distributional coherence across subpopulations by leveraging a newly devised statistics-based loss in conjunction with a triplet loss, thereby aligning feature distributions among clinically similar cohorts. This statistical alignment using T-test facilitates more interpretable and robust representation learning across heterogeneous patient groups. We evaluate SCN on four clinically diverse tasks, including breast cancer (mammography), obstructive sleep apnea (CT), rotator cuff tear (MRI), and Cormack-Lehane grading (X-ray), and demonstrate the consistent improvements over conventional single-space and multi-modal baselines. The experimental results highlight the importance of explicitly incorporating patient metadata, in terms of multimodal learning, to enhance model generalizability and clinical relevance.
Related Material
