DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Shi, Yujun; Xue, Chuhui; Liew, Jun Hao; Pan, Jiachun; Yan, Hanshu; Zhang, Wenqing; Tan, Vincent Y. F.; Bai, Song

Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8839-8849

Abstract

Accurate and controllable image editing is a challenging task that has attracted significant attention recently. Notably DragGAN developed by Pan et al. (2023) is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision. However due to its reliance on generative adversarial networks (GANs) its generality is limited by the capacity of pretrained GAN models. In this work we extend this editing framework to diffusion models and propose a novel approach DragDiffusion. By harnessing large-scale pretrained diffusion models we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images. Unlike other diffusion-based editing methods that provide guidance on diffusion latents of multiple time steps our approach achieves efficient yet accurate spatial control by optimizing the latent of only one time step. This novel design is motivated by our observations that UNet features at a specific time step provides sufficient semantic and geometric information to support the drag-based editing. Moreover we introduce two additional techniques namely identity-preserving fine-tuning and reference-latent-control to further preserve the identity of the original image. Lastly we present a challenging benchmark dataset called DragBench---the first benchmark to evaluate the performance of interactive point-based image editing methods. Experiments across a wide range of challenging cases (e.g. images with multiple objects diverse object categories various styles etc.) demonstrate the versatility and generality of DragDiffusion. Code and the DragBench dataset: https://github.com/Yujun-Shi/DragDiffusion.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Shi_2024_CVPR, author = {Shi, Yujun and Xue, Chuhui and Liew, Jun Hao and Pan, Jiachun and Yan, Hanshu and Zhang, Wenqing and Tan, Vincent Y. F. and Bai, Song}, title = {DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {8839-8849} }