Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Hao Li, Ying Chen, Yifei Chen, Rongshan Yu, Wenxian Yang, Liansheng Wang, Bowen Ding, Yuchen Han; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 11398-11407

Abstract


Whole Slide Image (WSI) classification is often formulated as a Multiple Instance Learning (MIL) problem. Recently Vision-Language Models (VLMs) have demonstrated remarkable performance in WSI classification. However existing methods leverage coarse-grained pathogenetic descriptions for visual representation supervision which are insufficient to capture the complex visual appearance of pathogenetic images hindering the generalizability of models on diverse downstream tasks. Additionally processing high-resolution WSIs can be computationally expensive. In this paper we propose a novel "Fine-grained Visual-Semantic Interaction" (FiVE) framework for WSI classification. It is designed to enhance the model's generalizability by leveraging the interaction between localized visual patterns and fine-grained pathological semantics. Specifically with meticulously designed queries we start by utilizing a large language model to extract fine-grained pathological descriptions from various non-standardized raw reports. The output descriptions are then reconstructed into fine-grained labels used for training. By introducing a Task-specific Fine-grained Semantics (TFS) module we enable prompts to capture crucial visual information in WSIs which enhances representation learning and augments generalization capabilities significantly. Furthermore given that pathological visual patterns are redundantly distributed across tissue slices we sample a subset of visual instances during training. Our method demonstrates robust generalizability and strong transferability dominantly outperforming the counterparts on the TCGA Lung Cancer dataset with at least 9.19% higher accuracy in few-shot experiments. The code is available at: https://github.com/ls1rius/WSI_FiVE.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Li_2024_CVPR, author = {Li, Hao and Chen, Ying and Chen, Yifei and Yu, Rongshan and Yang, Wenxian and Wang, Liansheng and Ding, Bowen and Han, Yuchen}, title = {Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {11398-11407} }