A Comparative Study of Vision Transformer Encoders and Few-Shot Learning for Medical Image Classification

Maxat Nurgazin, Nguyen Anh Tu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 2513-2521

Abstract


Recently, computer vision has been significantly impacted by Vision Transformer (ViT) networks. These deep models have also succeeded in medical image classification. However, most existing deep learning-based methods primarily rely on a lot of labeled data to train reliable classifiers for accurate prediction. This requirement might be impractical in the medical field, where the data is limited and manual annotation is expensive. Therefore, this study explores the application of ViT in few-shot learning scenarios for medical image analysis, addressing the challenges posed by limited data availability. We evaluate various ViT models alongside few-shot learning algorithms (i.e., ProtoNet, MatchingNet, and Reptile), perform cross-domain experiments, and analyze the impact of data augmentation techniques. Our findings indicate that when combined with ProtoNets, ViT architectures outperform CNN-based counterparts and achieve competitive performance against state-of-the-art approaches on benchmark datasets. Cross-domain experiments further reveal the effectiveness of ViT models in few-shot medical image classification.

Related Material


[pdf]
[bibtex]
@InProceedings{Nurgazin_2023_ICCV, author = {Nurgazin, Maxat and Tu, Nguyen Anh}, title = {A Comparative Study of Vision Transformer Encoders and Few-Shot Learning for Medical Image Classification}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {2513-2521} }