SwiniPASSR: Swin Transformer Based Parallax Attention Network for Stereo Image Super-Resolution
With binocular cameras being widely accepted, the study of stereo image super resolution (Stereo SR) has received increasing attention. Different from single image super resolution (SISR) setting, it is more challenging for utilizing both intra-view and cross-view information. Although prior convolution-based works have achieved admirable progress, few attempts have explored the possibility of the Transformer-based architecture for stereo image SR, which has demonstrated promising performance in several visual tasks. In this paper, we propose a novel approach namely SwiniPASSR, which adopts Swin Transformer as the backbone, meanwhile incorporating it with the Bi-directional Parallax Attention Module (biPAM) to maximize auxiliary information given by the binocular mechanism. Even Transformer and parallax attention mechanism (PAM) have been separately proved usefulness by prior studies, we find that simply integrating convolution-based PAM with Transformer or directly optimizing for stereo SR problem was may not achieve desirable result. We therefore introduced a conversion layer to resolve integration and adopted progressive training strategy to learn disparity correspondence through progressively enlarged receptive fields. Both extensive experiments and ablation studies demonstrate the effectiveness of our proposed SwiniPASSR. In particular, in the NTIRE 2022: Stereo Image Super-Resolution Challenge, we report 23.71dB PSNR and 0.7295 SSIM performance which ranked 2nd place on the leaderboard. Source code is available at https://github.com/SMI-Lab/SwinIPASSR.