Self-Guidance: Improve Deep Neural Network Generalization via Knowledge Distillation
We present Self-Guidance, a simple way to train deep neural networks via knowledge distillation. The basic idea is to train sub-network to match the prediction of the full network, so-called "Self-Guidance". Under the "teacher-student" framework, we construct both teacher and student within the same target network. Student network is the sub-networks that randomly skip some portions of the full network. The teacher network is the full network, can be considered as the ensemble of all possible student networks. The training process is performed in a closed-loop: (1) Forward prediction contains two passes that generate student and teacher predictions. (2) Backward distillation allows knowledge transfer from the teacher back to students. Comprehensive evaluations show that our approach improves the generalization ability of deep neural networks to a significant margin. The results prove our superior performance in both image classification on CIFAR10, CIFAR100, and facial expression recognition on FER-2013 and RAF.