Dropout Connects Transformers and CNNs: Transfer General Knowledge for Knowledge Distillation

Bokyeung Lee, Jonghwan Hong, Hyunuk Shin, Bonwha Ku, Hanseok Ko; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 8335-8344

Abstract


Thanks to their long-range dependencies transformers obtain state-of-the-art performance in diverse research fields such as computer vision and audio processing. In practical scenarios convolutional neural networks (CNNs) are used more than Transformers due to their low complexity. So Transformer-to-CNN knowledge distillation (KD) research where the Transformer is the teacher and the CNN is the student is in demand and receiving attention. In Transformer-to-CNN KD training the capacity gap problem arising from structural differences between the teacher and student networks is the main factor of performance degradation of the student network unlike homogenous architecture KD. However previous KD studies transfer all of a teacher's knowledge to the student without considering structural differences. They cannot overcome problems caused by structural differences and show poor performance in Transformer-to-CNN KD. In this paper we identify general and specific knowledge in feature maps of the teacher and student. General and specific knowledge are the generalized and non-generalized feature representation. We propose a novel KD framework DropKD which extracts general knowledge from the teacher and student while removing specific knowledge and then allows general knowledge of the student network to learn general knowledge of the teacher. Our DropKD empowers the student network to achieve generalization by effectively managing general and specific knowledge. Through extensive experiments on challenging image classification datasets we demonstrate that the proposed method is superior to existing methods.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Lee_2025_WACV, author = {Lee, Bokyeung and Hong, Jonghwan and Shin, Hyunuk and Ku, Bonwha and Ko, Hanseok}, title = {Dropout Connects Transformers and CNNs: Transfer General Knowledge for Knowledge Distillation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8335-8344} }