-
[pdf]
[arXiv]
[bibtex]@InProceedings{Kumar_2025_ICCV, author = {Kumar, Gurucharan Marthi Krishna and Chadha, Aman and Mendola, Janine D. and Shmuel, Amir}, title = {MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {1125-1135} }
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
Abstract
Medical image segmentation plays a key role in healthcare, enabling accurate diagnosis and treatment planning. Vision Transformers (ViTs) show strong potential for segmentation tasks, but their dependence on large datasets limits practical usage in clinical settings. This study explores whether integrating pre-trained Large Language Models (LLMs) with ViT-based segmentation models can enhance feature refinement and improve performance in data-constrained environments. We introduce \mathtt MedVisionLlama , which combines ViT encoders with pre-trained Llama weights and applies Low-Rank Adaptation (LoRA) for fine-tuning in 3D medical image segmentation. Evaluated on the Medical Segmentation Decathlon dataset, the model consistently outperformed a standard ViT, showing improved generalization across MRI and CT modalities. It maintained stable segmentation quality even with limited training data and across varied anatomical structures. Activation maps revealed sharper and more stable attention to relevant regions. Ablation studies confirmed that the performance gains stemmed from LLM-based feature refinement rather than increased model complexity. \mathtt MedVisionLlama offers a scalable and data-efficient solution for medical image segmentation. Source code and implementation are available at: https://github.com/AS-Lab/Marthi-et-al-2025-MedVisionLlama-Pre-Trained-LLM-Layers-to-Enhance-Medical-Image-Segmentation/
Related Material
