MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems

Tiago Mota, M. Rita Verdelho, Diogo J. Araújo, Alceu Bissoto, Carlos Santiago, Catarina Barata; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2395-2403


The acquisition of different data modalities can enhance our knowledge and understanding of various diseases paving the way for a more personalized healthcare. Thus medicine is progressively moving towards the generation of massive amounts of multi-modal data (e.g molecular radiology and histopathology). While this may seem like an ideal environment to capitalize data-centric machine learning approaches most methods still focus on exploring a single or a pair of modalities due to a variety of reasons: i) lack of ready to use curated datasets; ii) difficulty in identifying the best multi-modal fusion strategy; and iii) missing modalities across patients. In this paper we introduce a real world multi-modal dataset called MMIST-CCRCC that comprises 2 radiology modalities (CT and MRI) histopathology genomics and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC). We provide single and multi-modal (early and late fusion) benchmarks in the task of 12-month survival prediction in the challenging scenario of one or more missing modalities for each patient with missing rates that range from 26% for genomics data to more than 90% for MRI. We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting. Additionally incorporating a strategy to generate the latent representations of the missing modalities given the available ones further improves the performance highlighting a potential complementarity across modalities. Our dataset and code are available here:

Related Material

@InProceedings{Mota_2024_CVPR, author = {Mota, Tiago and Verdelho, M. Rita and Ara\'ujo, Diogo J. and Bissoto, Alceu and Santiago, Carlos and Barata, Catarina}, title = {MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2395-2403} }