Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement

Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, Haoning Wu, Qiong Yan, Weisi Lin; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2662-2672

Abstract


Image Quality Assessment (IQA) constitutes a fundamental task within the field of computer vision yet it remains an unresolved challenge owing to the intricate distortion conditions diverse image contents and limited availability of data. Recently the community has witnessed the emergence of numerous large-scale pretrained foundation models. However it remains an open problem whether the scaling law in high-level tasks is also applicable to IQA tasks which are closely related to low-level clues. In this paper we demonstrate that with a proper injection of local distortion features a larger pretrained vision transformer (ViT) foundation model performs better in IQA tasks. Specifically for the lack of local distortion structure and inductive bias of the large-scale pretrained ViT we use another pretrained convolution neural networks (CNNs) which is well known for capturing the local structure to extract multi-scale image features. Further we propose a local distortion extractor to obtain local distortion features from the pretrained CNNs and a local distortion injector to inject the local distortion features into ViT. By only training the extractor and injector our method can benefit from the rich knowledge in the powerful foundation models and achieve state-of-the-art performance on popular IQA datasets indicating that IQA is not only a low-level problem but also benefits from stronger high-level features drawn from large-scale pretrained models. Codes are publicly available at: https://github.com/NeosXu/LoDa.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Xu_2024_CVPR, author = {Xu, Kangmin and Liao, Liang and Xiao, Jing and Chen, Chaofeng and Wu, Haoning and Yan, Qiong and Lin, Weisi}, title = {Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {2662-2672} }