Video Quality Assessment Based on Swin Transformer With Spatio-Temporal Feature Fusion and Data Augmentation

Wei Wu, Shuming Hu, Pengxiang Xiao, Sibin Deng, Yilin Li, Ying Chen, Kai Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 1846-1854

Abstract


While video enhancement has drawn significant interest and has been extensively studied by academia and industry, the corresponding research on video quality assessment (VQA) for enhanced video has not been widely addressed. Video enhancement methods normally change the relevant metrics like brightness, contrast, color, etc., leading to the fluctuation of perceptual quality and challenging the related VQA task. In this paper, we propose a novel approach for VQA task based on Swin Transformer with improved spatio-temporal feature fusion, which precisely mines the stage-wise feature concatenation and provides competitive assessment performance. In addition, we propose an efficient data augmentation strategy to improve data diversity and further enhance assessment accuracy. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance on two benchmark VQA datasets, and ranks first in CVPR NTIRE 2023 Quality Assessment for Video Enhancement Challenge, which proves that the proposed approach is not only promising in VQA for enhanced video but also ubiquitous in general VQA tasks.

Related Material


[pdf]
[bibtex]
@InProceedings{Wu_2023_CVPR, author = {Wu, Wei and Hu, Shuming and Xiao, Pengxiang and Deng, Sibin and Li, Yilin and Chen, Ying and Li, Kai}, title = {Video Quality Assessment Based on Swin Transformer With Spatio-Temporal Feature Fusion and Data Augmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {1846-1854} }