Dynamic Window Transformer for Image Super-Resolution

Zheng Xie, Zhongxun Wang, Tianci Qin, Zhexuan Han, Ruoyu Zhou; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 3836-3850

Abstract


Image super-resolution (SR) reconstruction is a critical task in image processing that aims to generate high-resolution (HR) images from low-resolution (LR) inputs. Recently, Swin-Transformer-based models have become mainstream in this field due to their efficient handling of computational complexity and scalability, where window-based mechanisms are employed to effectively extract local features and window interaction strategies are utilized to enhance global information integration. However, existing Swin-Transformer-based SR models employ the fixed-window strategy, confining attention in fixed areas. In this paper, we present Dynamic Window Transformer (DWT), a simple but novel method that can use windows of various shapes to effectively extract diverse features and achieve efficient global dependency modelling by utilizing image anisotropy. The core of our DWT is Dynamic-Window Self-Attention (DWSA), which dynamically selects the optimal window for different inputs to perform self-attention. We evaluate our model on various popular benchmark datasets and compare it with other state-of-the-art (SOTA) lightweight models. For example, our DWT achieves a PSNR of 26.56 dB on the Urban100 dataset, which is 0.09 dB higher than the SOTA model SwinIR.

Related Material


[pdf]
[bibtex]
@InProceedings{Xie_2024_ACCV, author = {Xie, Zheng and Wang, Zhongxun and Qin, Tianci and Han, Zhexuan and Zhou, Ruoyu}, title = {Dynamic Window Transformer for Image Super-Resolution}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3836-3850} }