Adaptive Spatial-Temporal Fusion of Multi-Objective Networks for Compressed Video Perceptual Enhancement

He Zheng, Xin Li, Fanglong Liu, Lielin Jiang, Qi Zhang, Fu Li, Qingqing Dang, Dongliang He; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 268-275

Abstract


Perceptual quality enhancement of heavily compressed videos is a difficult, unsolved problem because there still not exists a suitable perceptual similarity loss function between two video pairs. Motivated by the fact that it is hard to design unified training objectives which are perceptual-friendly for enhancing regions with smooth content and regions with rich textures simultaneously, in this paper, we propose a simple yet effective novel solution dubbed "Adaptive Spatial-Temporal Fusion of Two-Stage Multi-Objective Networks" (ASTF) to adaptive fuse the enhancement results from networks trained with two different optimization objectives. Specifically, the proposed ASTF takes an enhancement frame along with its neighboring frames as input to jointly predict a mask to indicate regions with high-frequency textual details. Then we use the mask to fuse two enhancement results which can retain both smooth content and rich textures. Extensive experiments show that our method achieves a promising performance of compressed video perceptual quality enhancement.

Related Material


[pdf]
[bibtex]
@InProceedings{Zheng_2021_CVPR, author = {Zheng, He and Li, Xin and Liu, Fanglong and Jiang, Lielin and Zhang, Qi and Li, Fu and Dang, Qingqing and He, Dongliang}, title = {Adaptive Spatial-Temporal Fusion of Multi-Objective Networks for Compressed Video Perceptual Enhancement}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {268-275} }