COVER: A Comprehensive Video Quality Evaluator

He, Chenlong; Zheng, Qi; Zhu, Ruoxi; Zeng, Xiaoyang; Fan, Yibo; Tu, Zhengzhong

Chenlong He, Qi Zheng, Ruoxi Zhu, Xiaoyang Zeng, Yibo Fan, Zhengzhong Tu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 5799-5809

Abstract

Video quality assessment especially for a massive scale of user-generated content is an essential yet challenging computer vision and video analysis problem. Prior methods have been shown to be effective in mirroring subjective human opinion scores; however they fail to capture the complicated multi-dimensional aspects of factors that impact the overall perceptual quality. In this paper we introduce COVER a comprehensive video quality evaluator a novel framework designed to evaluate video quality holistically -- from a technical aesthetic and semantic perspective. Specifically COVER leverages three parallel branches: (1) a Swin Transformer backbone implemented on spatially sampled crops to predict technical quality; (2) a ConvNet employed on subsampled frames to derive aesthetic quality; (3) a CLIP image encoder executed on resized frames to obtain semantic quality. We further propose a simplified cross-gating block to interact with the three branches before feeding into the predicting head. The final quality score is attained using a weighted sum of each sub-score making a multi-faceted metric. Our experimental results demonstrate that COVER exceeds the state-of-the-art models in multiple UGC video quality datasets. Moreover COVER offers a diagnosable quality report to explain the quality score in multiple pillars while it is capable of processing 1080p videos at 96 fps speed 3x faster than the real-time requirement. We will make the code and models publicly available to facilitate future research on efficient and explainable video quality research.

Related Material

[pdf]

[bibtex]

@InProceedings{He_2024_CVPR, author = {He, Chenlong and Zheng, Qi and Zhu, Ruoxi and Zeng, Xiaoyang and Fan, Yibo and Tu, Zhengzhong}, title = {COVER: A Comprehensive Video Quality Evaluator}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {5799-5809} }