CLIPath: Fine-Tune CLIP with Visual Feature Fusion for Pathology Image Analysis Towards Minimizing Data Collection Efforts

Zhengfeng Lai, Zhuoheng Li, Luca Cerny Oliveira, Joohi Chauhan, Brittany N. Dugger, Chen-Nee Chuah; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 2374-2380

Abstract


Contrastive Language-Image Pre-training (CLIP) has shown its ability to learn distinctive visual representations and generalize to various downstream vision tasks. However, its applicability in the classification of pathology images with limited labeled data is still under study due to the giant domain shift (between large natural image datasets in the source domain and small-scale target pathology images) and overfitting issues. In this work, we first explore the zero-shot transferability of CLIP on pathology classification tasks and benchmark the performance. Then, we propose Residual Feature Connection (RFC) to fine-tune CLIP with a small amount of trainable parameters. RFC aims to fuse the task-specific knowledge learned from the target domain and the original knowledge pre-trained from CLIP. We show that RFC can adapt pre-trained CLIP to downstream pathology tasks and achieve good performance with just a few annotated samples. Specifically, RFC achieves over 19% improvement in accuracy when only using 0.1% of labeled data in PCam with only 10 minutes of fine-tuning while running on a single GPU.

Related Material


[pdf]
[bibtex]
@InProceedings{Lai_2023_ICCV, author = {Lai, Zhengfeng and Li, Zhuoheng and Oliveira, Luca Cerny and Chauhan, Joohi and Dugger, Brittany N. and Chuah, Chen-Nee}, title = {CLIPath: Fine-Tune CLIP with Visual Feature Fusion for Pathology Image Analysis Towards Minimizing Data Collection Efforts}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {2374-2380} }