Improved Knowledge Distillation for Training Fast Low Resolution Face Recognition Model

Wang, Mengjiao; Liu, Rujie; Hajime, Nada; Narishige, Abe; Uchida, Hidetsugu; Matsunami, Tomoaki

Mengjiao Wang, Rujie Liu, Nada Hajime, Abe Narishige, Hidetsugu Uchida, Tomoaki Matsunami; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0

Abstract

Low resolution (LR) face recognition (FR) is a challenging, yet common problem for FR task, especially for surveillance scenario. The issue addressed here is not just to build a LR-FR model, more importantly to make it run fast. Here, the knowledge distillation method is adopted for our task, where the teacher's knowledge can be 'distilled' into a small student model by guiding its training process. For LRFR task, the original knowledge distillation scheme would update the teacher's weights first by tuning it using LR augmented train set, and then the student model is trained using same train set under updated teacher's guidance. The problem of this method is that the weights tuning of large teacher model is time-consuming, especially for large-scale dataset. In this paper, we proposed an improved scheme to enable us to avoid the teacher retraining and still be able to train the small model for LR-FR task. Here, different from the original scheme, the train sets for teacher and student model become different, where the train set for teacher model keeps unchanged and the one student is LR augmented. Therefore, it becomes unnecessary to update teacher model any more since the train set is the unchanged. Only the small student model needs to be trained under the original teacher's guidance. This can speed up the whole training process, especially for large-scale dataset. The different train sets for teacher and student will increase the data distribution discrepancy. To solve this problem, we constrained the multikernel maximum mean discrepancy between outputs to reduce this influence. Experimental results show our method can accelerate the training process by about 5 times, while preserving the accuracy. Our student model has same level coresponding author: wangmengjiao@cn.fujitsu.com with respect to state-of-art accuracy on LFW and SCFace. It can achieve 3x acceleration comparing to teacher model and only takes 35ms to run on a CPU.

Related Material

[pdf]

[bibtex]

@InProceedings{Wang_2019_ICCV,
author = {Wang, Mengjiao and Liu, Rujie and Hajime, Nada and Narishige, Abe and Uchida, Hidetsugu and Matsunami, Tomoaki},
title = {Improved Knowledge Distillation for Training Fast Low Resolution Face Recognition Model},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}
}