- [pdf] [code]
Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling
Knowledge distillation (KD) is an effective and widely used technique of model compression which enables the deployment of deep networks in low-memory or fast-execution scenarios. Feature-based knowledge distillation is an important component of KD which leverages intermediate layers to supervise the training procedure of a student network. Nevertheless, the potential mismatch of intermediate layers may be counterproductive in the training procedure. In this paper, we propose a novel distillation framework, termed Decoupled Spatial Pyramid Pooling Knowledge Distillation, to distinguish the importance of regions in feature maps. Specifically, we reveal that (1) spatial pyramid pooling is an outstanding method to define the knowledge and (2) the lower activation regions in feature maps play a more important role in KD. Our experiments on CIFAR-100 and Tiny-ImageNet achieve state-of-the-art results.