Zero-Shot Depth Aware Image Editing with Diffusion Models

Parihar, Rishubh; VS, Sachidanand; Babu, R. Venkatesh

Rishubh Parihar, Sachidanand VS, R. Venkatesh Babu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 15748-15759

Abstract

Diffusion models have transformed image editing but struggle with precise depth-aware control, such as placing objects at a specified depth. Layered representations offer fine-grained control by decomposing an image into separate editable layers. However, existing methods simplistically represent a scene via a set of background and transparent foreground layers while ignoring the scene geometry - limiting their effectiveness for depth-aware editing. We propose Depth-Guided Layer Decomposition - a layering method that decomposes an image into foreground and background layers based on a user-specified depth value, enabling precise depth-aware edits. We further propose Feature Guided Layer Compositing - a zero-shot approach for realistic layer compositing by leveraging generative priors from pretrained diffusion models. Specifically, we guide the internal U-Net features to progressively fuse individual layers into a composite latent at each denoising step. This preserves the structure of individual layers while generating realistic outputs with appropriate color and lighting adjustments without a need for post-hoc harmonization models. We demonstrate our method on two key depth-aware editing tasks: 1) scene compositing by blending the foreground of one scene with the background of another at a specified depth, and; 2) object insertion at a user-defined depth. Our zero-shot approach achieves precise depth ordering and high-quality edits, surpassing specialized scene compositing and object placement baselines, as validated across benchmarks and user studies.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Parihar_2025_ICCV, author = {Parihar, Rishubh and VS, Sachidanand and Babu, R. Venkatesh}, title = {Zero-Shot Depth Aware Image Editing with Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {15748-15759} }