-
[pdf]
[supp]
[bibtex]@InProceedings{Wu_2026_CVPR, author = {Wu, Zhiqiang and Dong, Yitong and Wei, Xian}, title = {TUDSR: Twice Upsampling-Diffusion for Higher Super-Resolution}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {38208-38217} }
TUDSR: Twice Upsampling-Diffusion for Higher Super-Resolution
Abstract
Diffusion-based generative models have achieved remarkable success in real-world image super-resolution (SR). With tiled diffusion techniques, these models can produce high-resolution images that exceed their native-supported resolution. However, the quality of such high-resolution (e.g 2048^2) outputs often remains extremely poor, primarily due to two factors we consider: the image upsampling ratio (e.g x8) exceeding the model's native-supported upsampling ratio (e.g x4), and the model's native-supported resolution. In practice, training a native high-resolution model requires larger architectures, which incur significant computational overhead and GPU memory costs, making it hard on limited-resource equipment. Thus, we present TUDSR, a Twice Upsampling-Diffusion framework for higher SR. The TUDSR framework mainly consists of two stages: the first involves training at R-resolution, and the second introduces a looped chunk-based training strategy at NR-resolution. Each stage adapts a one-step GAN architecture comprising a generator and a discriminator. Based on SD2.1-base, we develop TUDSR-S, which achieves state-of-the-art performance across multiple benchmarks. Extensive experiments further demonstrate that TUDSR-S generates high-quality images at the resolutions of 1024^2 and even 2048^2, significantly outperforming existing approaches. Code is available at https://github.com/wuer5/TUDSR.
Related Material

