-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Zhang_2023_CVPR, author = {Zhang, Han and Feng, Ruili and Yang, Zhantao and Huang, Lianghua and Liu, Yu and Zhang, Yifei and Shen, Yujun and Zhao, Deli and Zhou, Jingren and Cheng, Fan}, title = {Dimensionality-Varying Diffusion Process}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {14307-14316} }
Dimensionality-Varying Diffusion Process
Abstract
Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension. We argue that, considering the spatial redundancy in image signals, there is no need to maintain a high dimensionality in the evolution process, especially in the early generation phase. To this end, we make a theoretical generalization of the forward diffusion process via signal decomposition. Concretely, we manage to decompose an image into multiple orthogonal components and control the attenuation of each component when perturbing the image. That way, along with the noise strength increasing, we are able to diminish those inconsequential components and thus use a lower-dimensional signal to represent the source, barely losing information. Such a reformulation allows to vary dimensions in both training and inference of diffusion models. Extensive experiments on a range of datasets suggest that our approach substantially reduces the computational cost and achieves on-par or even better synthesis performance compared to baseline methods. We also show that our strategy facilitates high-resolution image synthesis and improves FID of diffusion model trained on FFHQ at 1024x1024 resolution from 52.40 to 10.46. Code is available at https://github.com/damo-vilab/dvdp.
Related Material