- [pdf] [supp] [code]
Rethinking Online Knowledge Distillation with Multi-Exits
Online knowledge distillation is a method to train multiple networks simultaneously by distilling the knowledge among each other from scratch. An efficient way for this is to attach auxiliary classifiers (called exits) to the main network. However, in this multi-exit approach, there are important questions that have not been answered in previous studies: What structure should be used for exits? What can be a good teacher for distillation? How should the overall training loss be constructed? In this paper, we propose a new online knowledge distillation method using multi-exits by answering these questions. First, we examine the influence of the structure of the exits on the performance of the main network, and propose a bottleneck structure that leads to improved performance for a wide range of main network structures. Second, we propose a new distillation teacher using an ensemble of all the classifiers (main network and exits) by exploiting the diversity in the outputs and features of the classifiers. Third, we propose a new technique to form the overall training loss, which balances classification losses and distillation losses for effective training of the whole network. Our proposed method is termed balanced exit-ensemble distillation (BEED). Experimental results demonstrate that our method achieves significant improvement of classification performance on various popular convolutional neural network (CNN) structures. Code is available at https://github.com/hjdw2/BEED.