Saliency-Guided Transformer Network Combined With Local Embedding for No-Reference Image Quality Assessment

Zhu, Mengmeng; Hou, Guanqun; Chen, Xinjia; Xie, Jiaxing; Lu, Haixian; Che, Jun

Mengmeng Zhu, Guanqun Hou, Xinjia Chen, Jiaxing Xie, Haixian Lu, Jun Che; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 1953-1962

Abstract

No-Reference Image Quality Assessment (NR-IQA) methods based on Vision Transformer have recently drawn much attention for their superior performance. Unfortunately, being a crude combination of NR-IQA and Transformer, they can hardly take the advantage of their attributes. In this paper, we propose a novel Saliency-Guided Transformer Network combined with Local Embedding (TranSLA) for No-Reference Image Quality Assessment. Our TranSLA integrates multi-level information for a robust representation. Existed researches have shown that the human vision system concentrates more on the Region-of-interest (RoI) when assessing the image quality. Thus we combine saliency prediction with Transformer to guide the model highlight the RoI when aggregating the global information. Besides, we import local embedding for Transformer with gradient map. Since the gradient map focuses on extracting structured feature in detail, it can be used as a supplement to offer local information for Transformer. Then, the local and non-local information can be utilized. Moreover, to accelerate the aggregation of information from all tokens, we introduce a Boosting Interaction Module (BIM) to enhance feature aggregation. BIM forces patch tokens to interact better with class tokens at all levels. Experiments on two large-scale NR-IQA benchmarks demonstrate that our method significantly outperforms the state-of-the-arts.

Related Material

[pdf]

[bibtex]

@InProceedings{Zhu_2021_ICCV, author = {Zhu, Mengmeng and Hou, Guanqun and Chen, Xinjia and Xie, Jiaxing and Lu, Haixian and Che, Jun}, title = {Saliency-Guided Transformer Network Combined With Local Embedding for No-Reference Image Quality Assessment}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {1953-1962} }