-
[pdf]
[bibtex]@InProceedings{Zhao_2025_ICCV, author = {Zhao, Yihao and Zhong, Enhao and Yuan, Cuiyun and Li, Yang and Zhao, Man and Li, Chunxia and Hu, Jun and Liu, Wei and Liu, Chenbin}, title = {Med-VLM: Enhancing Medical Image Segmentation Accuracy through Vision-Language Model}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {7342-7352} }
Med-VLM: Enhancing Medical Image Segmentation Accuracy through Vision-Language Model
Abstract
We proposed Med-VLM (Medical Vision-language Model), an innovative approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Existing medical image segmentation methods face several challenges: (1) Current medical segmentation models often fail to effectively incorporate valuable prior knowledge, such as detailed descriptions of organ locations and characteristics. (2) Most text-visual models prioritize target identification, rather than focusing on enhancing overall accuracy. (3) While some approaches attempt to use prior knowledge for accuracy enhancement, they often fall short in effectively incorporating pre-trained models. To overcome these limitations, Med-VLM introduced several key innovations: low-rank adaptation, authoritative descriptions, BioBERT weights, and a feature mixer. We conducted a comprehensive evaluation of MedVLM using three authoritative medical image datasets, covering the segmentation of various human body parts. Our method demonstrated superior performance compared to existing state-of-the-art approaches, including Lvit, MedSAM, SAM, and nnUnet. We designed a series of ablation experiments, which systematically assessed the contribution of each component of Med-VLM, providing insights into the model's performance characteristics.
Related Material
