Multi-Scale Aligned Distillation for Low-Resolution Detection

Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14443-14453


In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. However, this option severely hurts the detection performance. This paper focuses on boosting the performance of a low-resolution model, by distilling knowledge from a high/multi-resolution model. We first identify the challenge of applying knowledge distillation to teacher and student networks that act on different input resolutions. To tackle the challenge, we explore the idea of spatially aligning feature maps between models of different input resolutions, by shifting the position of the feature pyramid structure. With the alignment idea, we introduce aligned multi-scale training to train a multi-scale teacher that can distill its knowledge seamlessly to a low-resolution student. Furthermore, we propose cross feature-level fusion to dynamically fuse the multi-resolution features of the same teacher, to better guide the student. On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training, while outperforming the latter's low-resolution models by 2.1% to 3.6% in mAP.

Related Material

[pdf] [supp]
@InProceedings{Qi_2021_CVPR, author = {Qi, Lu and Kuen, Jason and Gu, Jiuxiang and Lin, Zhe and Wang, Yi and Chen, Yukang and Li, Yanwei and Jia, Jiaya}, title = {Multi-Scale Aligned Distillation for Low-Resolution Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {14443-14453} }