SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, Feng Yang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 7323-7334

Abstract


Recently, diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of high-quality images from text prompts and various conditions. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, the large parameter space is inefficient for model storage. In this paper, we propose a novel approach to address the limitations in existing text-to-image diffusion models for personalization and customization. Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space that reduces the risk of overfitting and language-drifting. Our approach also includes a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework. Our proposed SVDiff method has a significantly smaller model size (1.7MB for StableDiffusion) compared to existing methods, making it more practical for real-world applications.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Han_2023_ICCV, author = {Han, Ligong and Li, Yinxiao and Zhang, Han and Milanfar, Peyman and Metaxas, Dimitris and Yang, Feng}, title = {SVDiff: Compact Parameter Space for Diffusion Fine-Tuning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {7323-7334} }