PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention

Wang, Weicheng; Jia, Guoli; Zhang, Zhongqi; Lin, Liang; Yang, Jufeng

Weicheng Wang, Guoli Jia, Zhongqi Zhang, Liang Lin, Jufeng Yang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 18302-18312

Abstract

Diffusion models pre-trained on large-scale paired image-text data achieve significant success in image editing. To convey more fine-grained visual details, subject-driven editing integrates subjects in user-provided reference images into existing scenes. However, it is challenging to obtain photorealistic results, which simulate contextual interactions, such as reflections, illumination, and shadows, induced by merging the target object into the source image. To address this issue, we propose PS-Diffusion, which ensures realistic and consistent object-scene blending while maintaining the invariance of subject appearance during editing. To be specific, we first divide the contextual interactions into those occurring in the foreground and the background areas. The effect of the former is estimated through intrinsic image decomposition, and the region of the latter is predicted in an additional background effect control branch. Moreover, we propose an effect attention module to disentangle the learning processes of interaction and subject, alleviating confusion between them. Additionally, we introduce a synthesized dataset, Replace-5K, consisting of 5,000 image pairs with invariant subject and contextual interactions via 3D rendering. Extensive quantitative and qualitative experiments on our dataset and two real-world datasets demonstrate that our method achieves state-of-the-art performance. The code is available in the https://github.com/wei-cheng777/PS-Diffusion.

Related Material

[pdf]

[bibtex]

@InProceedings{Wang_2025_CVPR, author = {Wang, Weicheng and Jia, Guoli and Zhang, Zhongqi and Lin, Liang and Yang, Jufeng}, title = {PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {18302-18312} }