Attention Retractable Frequency Fusion Transformer for Image Super Resolution
Transformer-based image super-resolution (SR) has offered promising performance gains over the convolutional neural network-based one due to the adoption of parameter-independent global interactions. However, the existing Transformer-based methods are limited to obtaining enough global information due to the use of self-attention within non-overlapping windows, which restricts the receptive fields. To address this issue, we construct an effective image SR model based on the attention retractable frequency Transformer with the proposed spatial-frequency fusion block. In our method, the spatial-frequency fusion block is designed to strengthen the representation ability of the Transformer and extend the receptive field to the whole image to improve the quality of SR results. Furthermore, a progressive training strategy is proposed to use image patches with different sizes to train our SR model to further improve the SR performance. The experimental results demonstrate that our proposed method outperforms the state-of-the-art methods over various benchmark datasets, both objectively and subjectively.