-
[pdf]
[arXiv]
[bibtex]@InProceedings{Wei_2024_CVPR, author = {Wei, Shicai and Luo, Chunbo and Luo, Yang}, title = {Scaled Decoupled Distillation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {15975-15983} }
Scaled Decoupled Distillation
Abstract
Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However it often suffers inferior performance compared to the feature knowledge distillation. In this paper we argue that existing logit-based methods may be sub-optimal since they only leverage the global logit output that couples multiple semantic knowledge. This may transfer ambiguous knowledge to the student and mislead its learning. To this end we propose a simple but effective method i.e. Scale Decoupled Distillation (SDD) for logit knowledge distillation. SDD decouples the global logit output into multiple local logit outputs and establishes distillation pipelines for them. This helps the student to mine and inherit fine-grained and unambiguous logit knowledge. Moreover the decoupled knowledge can be further divided into consistent and complementary logit knowledge that transfers the semantic information and sample ambiguity respectively. By increasing the weight of complementary parts SDD can guide the student to focus more on ambiguous samples improving its discrimination ability. Extensive experiments on several benchmark datasets demonstrate the effectiveness of SDD for wide teacher-student pairs especially in the fine-grained classification task. Code is available at: \href https://github.com/shicaiwei123/SDD-CVPR2024 https://github.com/shicaiwei123/SDD-CVPR2024
Related Material