SAIL: Self-supervised Learning of Lighting-Invariant Representations from Real Images with Latent Diffusion

WACV 2026

1IBISC, Univ. Evry Paris-Saclay,
2L Research

We propose a self-supervised albed-like estimation with a latent diffusion model for real images. From a single image under real-world lighting conditions, SAIL extracts high-fidelity light-invariant representation by repurposing and finetuning a pretrained latent diffusion model (left). The estimated albedo enables downstream tasks such as single-image virtual relighting, demonstrated using Blender with different environment maps (right).

Results

Light-invariant estimation for images under complex real-world lighting conditions.

Input LatentIntrinsics SAIL(Ours)
Input LatentIntrinsics SAIL(Ours)
Input LatentsIntrinsics SAIL(Ours)
Input LatentsIntrinsics SAIL(Ours)


Consistency accross diffrent illuminations.

SAIL is more robust and consistent to various illumination of the same scene.




Applications

Color editing accross diffrent illuminations.

SAIL is more robust and consistent to various illumination of the same scene.




Unconditioned image relighting

Our method enables uncondtioned realistic relighting from a single image by predicting albedos unaffected by illumination, allowing diverse lighting conditions to be generated without any explicit supervision and conditioning.


Virtual image relighting

We show virtual relighting results using Blender with different environment maps.




Abstract

Intrinsic image decomposition aims at separating an image into its underlying albedo and shading components, isolating the base color from lighting effects to enable downstream applications such as virtual relighting and scene editing. Despite the rise and success of learning-based approaches, intrinsic image decomposition from real-world images remains a significantly challenging task due to the scarcity of labeled ground-truth data. Most existing solutions rely on synthetic data as supervised setups, limiting their ability to generalize to real-world scenes. Self-supervised methods, on the other hand, often produce albedo-like maps that contain reflections and lack consistency under different lighting conditions. To address this, we propose SAIL, an approach designed to estimate illumination-invariant representations from single-view real-world images to specifically target plausible relighting. We repurpose the prior knowledge of a latent diffusion model for unconditioned scene relighting as a surrogate objective for learning light-invariant estimates. To achieve this, we introduce a novel intrinsic image decomposition fully formulated in the latent space. To guide the training of our latent diffusion model, we introduce regularization terms that constrain both the lighting-dependent and -independent components of our latent image decomposition. Through our experiments, we demonstrate that SAIL produces stable albedo-like representations under varying lighting conditions and generalizes to multiple scenes, using only unlabeled multi-illumination data available online.