Scaling Properties of Diffusion Models For Perceptual Tasks

Ravishankar, Rahul; Patel, Zeeshan; Rajasegaran, Jathushan; Malik, Jitendra

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 12945-12954

Abstract

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Ravishankar_2025_CVPR, author = {Ravishankar, Rahul and Patel, Zeeshan and Rajasegaran, Jathushan and Malik, Jitendra}, title = {Scaling Properties of Diffusion Models For Perceptual Tasks}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {12945-12954} }