D2Cache: Second-Order Delta Caching for Higher Video Diffusion Acceleration

Enhuai Liu, Yunke Wang, Changming Sun, Chang Xu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 43589-43599

Abstract


Video diffusion models achieve impressive visual fidelity but remain computationally prohibitive for real-time or interactive generation due to their sequential denoising process. Recent caching methods accelerate inference by reusing outputs across timesteps, typically estimating each new output from the first-order residual, which is the difference between adjacent model predictions.To mitigate the accumulated error in caching methods, we propose D2Cache, a training-free method that leverages the smoothness of second-order residual delta, which is temporal differences between consecutive first-order residuals, to predict future timesteps more accurately. We theoretically show that this second-order correction improves prediction accuracy and effectively suppresses cumulative errors. Moreover, D2Cache adaptively scales second-order deltas using error estimates derived from timestep embeddings, maintaining accuracy across varying cache intervals.Empirically, D2Cache outperforms the state-of-the-art TeaCache across four video diffusion models (Latte, Open-Sora, LTX-video, and Wan2.1) at comparable acceleration rates, showing even larger gains under higher acceleration settings.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Liu_2026_CVPR, author = {Liu, Enhuai and Wang, Yunke and Sun, Changming and Xu, Chang}, title = {D2Cache: Second-Order Delta Caching for Higher Video Diffusion Acceleration}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {43589-43599} }