PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild

Kun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2835-2845

Abstract


Video quality assessment (VQA) is a challenging problem due to the numerous factors that can affect the perceptual quality of a video e.g. content attractiveness distortion type motion pattern and level. However annotating the Mean opinion score (MOS) for videos is expensive and time-consuming which limits the scale of VQA datasets and poses a significant obstacle for deep learning-based methods. In this paper we propose a VQA method named PTM-VQA which leverages PreTrained Models to transfer knowledge from models pretrained on various pre-tasks enabling benefits for VQA from different aspects. Specifically we extract features of videos from different pretrained models with frozen weights and integrate them to generate representation. Since these models possess various fields of knowledge and are often trained with labels irrelevant to quality we propose an Intra-Consistency and Inter-Divisibility (ICID) loss to impose constraints on features extracted by multiple pretrained models. The intra-consistency constraint ensures that features extracted by different pretrained models are in the same unified quality-aware latent space while the inter-divisibility introduces pseudo clusters based on the annotation of samples and tries to separate features of samples from different clusters. Furthermore with a constantly growing number of pretrained models it is crucial to determine which models to use and how to use them. To address this problem we propose an efficient scheme to select suitable candidates. Models with better clustering performance on VQA datasets are chosen to be our candidates. Extensive experiments demonstrate the effectiveness of the proposed method.

Related Material


[pdf]
[bibtex]
@InProceedings{Yuan_2024_CVPR, author = {Yuan, Kun and Liu, Hongbo and Li, Mading and Sun, Muyi and Sun, Ming and Gong, Jiachao and Hao, Jinhua and Zhou, Chao and Tang, Yansong}, title = {PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {2835-2845} }