-
[pdf]
[supp]
[bibtex]@InProceedings{Du_2025_ICCV, author = {Du, Peng and Li, Hui and Xu, Han and Jeon, Paul Barom and Lee, Dongwook and Ji, Daehyun and Yang, Ran and Zhu, Feng}, title = {Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {19700-19710} }
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Abstract
Discrete Wavelet Transform (DWT) has been widely explored to enhance the performance of image super-resolution (SR). Despite some DWT-based methods improving SR by capturing fine-grained frequency signals, most existing approaches neglect the interrelations among multi-scale frequency sub-bands, resulting in inconsistencies and unnatural artifacts in the reconstructed images. To address this challenge, we propose a Diffusion Transformer model based on image Wavelet spectra for SR (DTWSR). DTWSR incorporates the superiority of diffusion models and transformers to capture the interrelations among multi-scale frequency sub-bands, leading to a more consistence and realistic SR image. Specifically, we use a Multi-level Discrete Wavelet Transform (MDWT) to decompose images into wavelet spectra. A pyramid tokenization method is proposed which embeds the spectra into a sequence of tokens for transformer model, facilitating to capture features from both spatial and frequency domain. A dual-decoder is designed elaborately to handle the distinct variances in low-frequency (LF) and high-frequency (HF) sub-bands, without omitting their alignment in image generation. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method, with high performance on both perception quality and fidelity.
Related Material
