A Simple Transformer-Style Network for Lightweight Image Super-Resolution
The task of single image super resolution (SISR) has taken much attention in the last few years due to the wide range of real-world applications. However, most of the recently developed methods are computationally expensive and need much more memory. To solve this issue, we propose a simple Transformer-style network (STSN) for the image super resolution (SR) task. The idea of this method is based on using convolutional modulation (Conv2Former), which is a very simple block with a linearly compared to quadratically as in Transformers. This Conv2Former is simplified the self-attention mechanism based on utilizing only convolutions and Hadamard product. Also, the original Conv2Former is further improved to be able to extract local features, which is helpful for SR task. Based on this Conv2Former and multi-layer perceptron (MLP), we propose a convolutional modulation block (Conv2FormerB) which is similar to the Transformers block. Based on this Conv2FormerB, 3 x 3 convolution and enhanced spatial attention (ESA) block, an STSN is designed for the SISR task. This STSN achieved good results in multiple SR benchmarks. Finally, our STSN model attained 5.6 x faster run time compared to LWSwinIR.