Knowledge Distillation Dealing with Sample-wise Long-tail Problem

Tao Yu, Xu Zhao, Yongqi An, Ming Tang, Jinqiao Wang; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 2354-2370

Abstract


We discover that while knowledge distillation improves the overall performance of student models, the performance improvement for some samples in the tail is limited, which is a rarely addressed issue. These tail samples can lead to poor learning of the teacher's feature distribution in the corresponding regions of the feature space, thereby limiting the alignment between the student and the teacher. Since tail samples often lack clear label definitions in many tasks, we identify them by analyzing the average feature similarity from the teacher model. To improve knowledge distillation, we propose a Sample-wise Re-weighting (SRW) method, assigning different loss function weights to samples based on their average similarity. Experimental results show that our method enhances the performance of student models across different tasks and can be combined with various knowledge distillation methods. Additionally, our approach demonstrates advantages in foundational models such as Segmentation Anything Models (SAM) and Contrastive LanguageImage Pretraining (CLIP) models.

Related Material


[pdf]
[bibtex]
@InProceedings{Yu_2024_ACCV, author = {Yu, Tao and Zhao, Xu and An, Yongqi and Tang, Ming and Wang, Jinqiao}, title = {Knowledge Distillation Dealing with Sample-wise Long-tail Problem}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {2354-2370} }