LLM-Generated Rewrite and Context Modulation for Enhanced Vision Language Models in Digital Pathology

Cagla Deniz Bahadir, Gozde B. Akar, Mert R. Sabuncu; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 327-336

Abstract


Recent advancements in vision-language models (VLMs) have found important applications in medical imaging particularly in digital pathology. VLMs demand large-scale datasets of image-caption pairs which is often hard to obtain in medical domains. State-of-the-art VLMs in digital pathology have been pre-trained on datasets that are significantly smaller than their computer vision counterparts. Furthermore the caption of a pathology slide often refers to a small sub-set of features in the image--an important point that is ignored in existing VLM pre-training schemes. Another important issue that is under-appericated is that the performance of state-of-the-art VLMs in zero-shot classification tasks can be sensitive to the choice of the prompts. In this paper we first employ language rewrites using a large language model (LLM) to enrich a public pathology image-caption dataset and make it publicly available. Our extensive experiments demonstrate that by training with language rewrites we can boost the performance of a state-of-the-art digital pathology VLM on downstream tasks such as zero-shot classification and text-to-image and image-to-text retrieval. We further leverage LLMs to demonstrate the sensitivity of zero-shot classification results to the choice of prompts and propose a scalable approach to characterize this when comparing models. Finally we present a novel context modulation layer that adjusts the image embeddings for better aligning with the paired text and use context-specific language rewrites for training this layer. In our results we show that the proposed context modulation framework can further yield substantial performance gains.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Bahadir_2025_WACV, author = {Bahadir, Cagla Deniz and Akar, Gozde B. and Sabuncu, Mert R.}, title = {LLM-Generated Rewrite and Context Modulation for Enhanced Vision Language Models in Digital Pathology}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {327-336} }