-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Liu_2026_CVPR, author = {Liu, Ziming and Yang, Yifan and Zhang, Chengruidong and Zhang, Yiqi and Qiu, Lili and You, Yang and Yang, Yuqing}, title = {Region-Adaptive Sampling for Diffusion Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {2346-2356} }
Region-Adaptive Sampling for Diffusion Transformers
Abstract
Diffusion models (DMs) have become the state-of-the-art for generative tasks across domains, but their reliance on sequential forward passes limits real-time performance. Prior acceleration methods mainly reduce sampling steps or reuse intermediate results. Leveraging the flexibility of Diffusion Transformers (DiTs) to handle variable token counts, we propose RAS, a training-free sampling strategy that dynamically assigns different update ratios to image regions based on model focus. Our key observation is that at each step, DiTs concentrate on semantically meaningful areas, and these regions exhibit strong continuity across consecutive steps. Exploiting this, RAS updates only focused regions while reusing cached noise for others, with focus determined from the previous step's output. Evaluated on Stable Diffusion 3 and Lumina-Next-T2I, RAS achieves up to 2.36x and 2.51x speedups, respectively, with minimal quality loss. This demonstrates a practical step toward more efficient diffusion transformers for real-time generation.
Related Material

