An Empirical Investigation of Efficient Spatio-Temporal Modeling in Video Restoration

Fan, Yuchen; Yu, Jiahui; Liu, Ding; Huang, Thomas S.

Yuchen Fan, Jiahui Yu, Ding Liu, Thomas S. Huang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0-0

Abstract

We present an empirical investigation of efficient spatio-temporal modeling in video restoration tasks. To achieve a better speed-accuracy trade-off, our investigation covers the intersection of three dimensions in deep video restoration networks: spatial-wise, channel-wise and temporal-wise. We enumerate various network architectures ranging from 2D convolutional models to 3D convolutional models, and discuss their gain and loss in terms of training time, model size, boundary effects, prediction accuracy and the visual quality of restored videos. Under a strictly controlled computational budget, we also specifically explore the design inside each residual building block in a video restoration network, which consists a mixture of 2D and 3D convolutional layers. Our findings are summarized as follows: (1) In 3D convolutional models, setting more computation/channels for spatial convolution leads to better performance than on temporal convolution. (2) The best variant of 3D convolutional models is better than 2D convolutional models, but the performance gap is close. (3) In a very limited range, the performance can be improved by the increase of window size (5 frames for 2D model) or padding size (6 frames for 3D model). Based on these findings, we introduce the WDVR, wide-activated 3D convolutional network for video restoration, which achieves a better accuracy under similar computational budgets and runtime latency. Our solution based on WDVR also won 2nd places in three out of four tracks of NTIRE 2019 Challenge for Video Super-Resolution and Deblurring.

Related Material

[pdf]

[bibtex]

@InProceedings{Fan_2019_CVPR_Workshops,
author = {Fan, Yuchen and Yu, Jiahui and Liu, Ding and Huang, Thomas S.},
title = {An Empirical Investigation of Efficient Spatio-Temporal Modeling in Video Restoration},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}