Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Xuran Ma, Yexin Liu, Yaofu Liu, Xianfeng Wu, Mingzhe Zheng, Zihao Wang, Ser-Nam Lim, Harry Yang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 17150-17159

Abstract


Video generation using diffusion models has shown remarkable progress, yet it remains computationally expensive due to the repeated processing of redundant features across blocks and steps. To address this, we propose a novel adaptive feature reuse mechanism that dynamically identifies and caches the most informative features by focusing on foreground and caching more on background, significantly reducing computational overhead with less sacrificing video quality. By leveraging the step and block caching, our method achieves up to 1.8x speed up on HunyuanVideo while maintaining competitive performance on Vbench, PSNR, SSIM, FID and LPIPS. Extensive experiments demonstrate that our approach not only improves efficiency but also enhances the quality of generated videos. The proposed method is generalizable and can be integrated into existing diffusion transformer frameworks.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Ma_2025_ICCV, author = {Ma, Xuran and Liu, Yexin and Liu, Yaofu and Wu, Xianfeng and Zheng, Mingzhe and Wang, Zihao and Lim, Ser-Nam and Yang, Harry}, title = {Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {17150-17159} }