Space-Time Distillation for Video Super-Resolution

Zeyu Xiao, Xueyang Fu, Jie Huang, Zhen Cheng, Zhiwei Xiong; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2113-2122


Compact video super-resolution (VSR) networks can be easily deployed on resource-limited devices, e.g., smart-phones and wearable devices, but have considerable performance gaps compared with complicated VSR networks that require a large amount of computing resources. In this paper, we aim to improve the performance of compact VSR networks without changing their original architectures, through a knowledge distillation approach that transfers knowledge from a complicated VSR network to a compact one. Specifically, we propose a space-time distillation (STD) scheme to exploit both spatial and temporal knowledge in the VSR task. For space distillation, we extract spatial attention maps that hints the high-frequency video content from both networks, which are further used for transferring spatial modeling ability. For time distillation, we narrow the performance gap between compact models and complicated models by distilling the feature similarity of the temporal memory cells, which is encoded from the sequence of feature maps generated in the training clips using ConvLSTM. During the training process, STD can be easily incorporated into any network without changing the original network architecture. Experimental results on standard benchmarks demonstrate that, in resource-constrained situations, the proposed method notably improve the performance of existing VSR networks without increasing the inference time.

Related Material

[pdf] [supp]
@InProceedings{Xiao_2021_CVPR, author = {Xiao, Zeyu and Fu, Xueyang and Huang, Jie and Cheng, Zhen and Xiong, Zhiwei}, title = {Space-Time Distillation for Video Super-Resolution}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2021}, pages = {2113-2122} }