CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

He, Kai; Wu, Chin-Hsuan; Gilitschenski, Igor

Kai He, Chin-Hsuan Wu, Igor Gilitschenski; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 26630-26640

Abstract

Achieving controllable and consistent editing in dynamic 3D scenes remains a significant challenge. Previous work is largely constrained by its editing backbones, resulting in inconsistent edits and limited controllability. We propose to address this challenge using personalized diffusion models. In our work, we introduce a novel framework that first fine-tunes the InstructPix2Pix model, followed by a two-stage optimization of the scene based on deformable 3D Gaussians. The fine-tuning enables the model to "learn" the editing ability from a single edited reference image. This reduces the more complex task of dynamic scene editing to the more simple 2D image editing problem. By directly learning editing regions and styles from the reference, our approach enables consistent and precise local edits without the need for tracking desired editing regions, effectively addressing key challenges in dynamic scene editing. Then, our two-stage optimization progressively edits the trained dynamic scene, using a designed edited image buffer to accelerate convergence and improve temporal consistency. Compared to state-of-the-art methods, our approach offers more flexible and controllable local scene editing, achieving high-quality and consistent results.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{He_2025_CVPR, author = {He, Kai and Wu, Chin-Hsuan and Gilitschenski, Igor}, title = {CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {26630-26640} }