- [pdf] [supp] [arXiv]
Reverse Knowledge Distillation: Training a Large Model Using a Small One for Retinal Image Matching on Limited Data
Retinal image matching (RIM) plays a crucial role in monitoring disease progression and treatment response as retina is the only tissue where blood vessels can be directly observed. However, datasets with matched keypoints between temporally separated pairs of images are not available in abundance to train transformer-based models. Firstly, we release keypoint annotations for retinal images from multiple datasets to aid further research on RIM. Secondly, we propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting. We propose architectural modifications to a CNN-based semi-supervised method called SuperRetina  that helps improve its results on a publicly available dataset. We train a computationally heavier model based on a vision transformer encoder, utilizing the lighter CNN-based model. This approach, which we call reverse knowledge distillation (RKD), further improves the matching results even though it contrasts with the conventional knowledge distillation where lighter models are trained based on heavier ones is the norm. Further, we show that our technique generalizes to other domains, such as facial landmark matching.