Tree-Like Decision Distillation

Song, Jie; Zhang, Haofei; Wang, Xinchao; Xue, Mengqi; Chen, Ying; Sun, Li; Tao, Dacheng; Song, Mingli

Jie Song, Haofei Zhang, Xinchao Wang, Mengqi Xue, Ying Chen, Li Sun, Dacheng Tao, Mingli Song; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13488-13497

Abstract

Knowledge distillation pursues a diminutive yet well-behaved student network by harnessing the knowledge learned by a cumbersome teacher model. Prior methods achieve this by making the student imitate shallow behaviors, such as soft targets, features, or attention, of the teacher. In this paper, we argue that what really matters for distillation is the intrinsic problem-solving process captured by the teacher. By dissecting the decision process in a layer-wise manner, we found that the decision-making procedure in the teacher model is conducted in a coarse-to-fine manner, where coarse-grained discrimination (e.g., animal vs vehicle) is attained in early layers, and fine-grained discrimination (e.g., dog vs cat, car vs truck) in latter layers. Motivated by this observation, we propose a new distillation method, dubbed as Tree-like Decision Distillation (TDD), to endow the student with the same problem-solving mechanism as that of the teacher. Extensive experiments demonstrated that TDD yields competitive performance compared to state of the arts. More importantly, it enjoys better interpretability due to its interpretable decision distillation instead of dark knowledge distillation.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Song_2021_CVPR, author = {Song, Jie and Zhang, Haofei and Wang, Xinchao and Xue, Mengqi and Chen, Ying and Sun, Li and Tao, Dacheng and Song, Mingli}, title = {Tree-Like Decision Distillation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {13488-13497} }