UniLight: A Unified Representation for Lighting

Supplementary Material

Table of Contents

Additional implementation details:

  1. VLM (InternVL3) prompt templates, extending Sec. 3 - Text in the paper.
  2. Brightest region identification algorithm, extending Sec. 3 - Text in the paper.
  3. Example descriptions of different lengths, extending Sec. 3 - Text in the paper.
  4. Text encoder (Qwen3) instructions, extending Sec. 4.1 - Text encoder in the paper.
  5. Envmap generation pipeline, extending Sec. 5.3 in the paper.

Additional experiment results:

  1. Light retrieval results, extending Sec. 5.2 in the paper.
  2. Environment map generation - Indoor, extending Sec. 5.3 in the paper.
  3. Environment map generation - Outdoor, extending Sec. 5.3 in the paper.
  4. Light control / relighting - Indoor, extending Sec. 5.4 in the paper.
  5. Light control / relighting - Outdoor, extending Sec. 5.4 in the paper.

1. VLM (InternVL3) prompt templates

The VLM is fed three inputs: the cropped RGB image, its corresponding Reinhard tonemapped envmap, and the brightest region information extracted by the algorithm above. The VLM is first instructed to generate a detailed one-paragraph lighting description. Based on this detailed paragraph, it is then instructed to produce two summary variants: a two-sentence version and a few-words version.

One-paragraph prompt (detailed)


You are an expert in analyzing {scene_type (indoor/outdoor)} scene lighting. Your task is to describe the lighting in the image with technical accuracy.
Based on the provided images (cropped view, panorama, and coordinate map), write a concise, single-paragraph description of the lighting as seen from the perspective of the cropped image. \n

The following is the content of this paragraph:
    (for indoor scenes) Start directly, begin the paragraph by immediately describing the most significant light source. Do not use introductory sentences like "The scene is illuminated by..." or "There are several light sources."
    1. Identify Key Light Sources and Describe Each Source: Use the direct light source position and brightness information provided below, and use the panorama for full context, identify the dominant light sources that directly illuminate the scene in the cropped image. These can include windows, lamps, strip lights, or other fixtures. For each significant and dominant light source, describe its type (e.g., window, recessed strip light), position relative to the view (similar to which in the direct light information), color (e.g., warm yellow, neutral white), and brightness (e.g., bright, soft). Only one concise sentence should be used for each light source.
    2. Use one short and concise sentence to describe the overall color of the scene.

    (for outdoor scenes) Start directly, begin the paragraph by immediately describing the most significant light source. Do not use introductory sentences like "The scene is illuminated by..." or "There are several light sources."
    1. Describe the Primary Natural Light: Identify the main source of natural light (e.g., the sun) with the light information provided below. In a single sentence, describe its direction relative to the view (similar to which in the direct light information), its color/hue (e.g., "warm golden," "cool blue"), and its brightness (e.g., "bright and direct," "soft and diffused").
    2. Detail Any Artificial Lights: If any artificial lights are active and visible (like streetlights or building lights), briefly describe their type, location, and color.
    3. Use one short and concise sentence to describe the overall color of the scene.

Important formatting requirements:
- Must give the correct and faithful description based on the lighting conditions of the scene
- Do not mention the coordinate colors in your final output.
- Make sure this paragraph flows naturally, and avoid redundancy
- Write in complete sentences without using bullet points, dashes, or numbered lists
- Do not use bullet points, dashes (-), or numbered lists
- Provide concise and brief descriptions
- Do not use words expressing uncertainty like 'appears to be', 'seems to', 'likely', or 'suggests'. State the lighting conditions as fact
- Do not use words like 'cropped image', 'cropped view', 'panorama' in your final output

You analysis should:
Use the Panorama for Context: the panorama provides a complete 360-degree view of all light sources. Use this to understand the lighting, but focus your description only on the lights that directly and strongly illuminate the scene in the cropped image.
Use the direct light source position and brightness information below (very important, the most precise information) to understand the lighting conditions

Here is some auxiliary information about the light sources in the scene to help you better understand the lighting conditions:\n

"Light {i}: maximum brightness {light['max_brightness']}, position description: {light['position_description']}, theta (elevation angle on envmap, center is 0): {light['theta_deg']}, phi (azimuthal angle on envmap, center is 0): {light['phi_deg']} \n"

        

Summary — two sentences


You are an expert in analyzing {scene_type} scene lighting. Given existing lighting description, your task is to summarize current descriptions according to the provided images (cropped view, panorama, and coordinate map).

Important requirements:
- Must give the correct and faithful description based on the lighting conditions of the scene
- Use in total of two sentences, one for direct lighting that dominates the scene lighting, must describe what where these light sources are and their positions. And the other one describe the overall lighting, focus on the color (must have) and brightness
- Do not include any additional information or context beyond the lighting description
- These two sentences should be clearly separated, and very short and concise, like what humans will say
- Make sure them flows naturally, and avoid redundancy
- Write in complete sentences without using bullet points, dashes, or numbered lists
- Do not use words expressing uncertainty like 'appears to be', 'seems to', 'likely', or 'suggests'. State the lighting conditions as fact

Current light description: {cur_light_description}

        

Summary — few words


You are an expert in analyzing {scene_type} scene lighting. Given existing lighting description, your task is to summarize current descriptions to a few words according to the provided images (cropped view and panorama) and the direct light source position and brightness information.

Important requirements:
- Must give the correct and faithful description based on the lighting conditions of the scene
- Use a few phrases (not complete sentences) to summarize the scene lighting condition
- Describe predominant direct light sources (like "a bright sun from the upper right", etc.) and overall scene lighting.
- Do not include any additional information or context beyond the lighting description
- Separate the phrases with commas
- Do not use words expressing uncertainty like 'appears to be', 'seems to', 'likely', or 'suggests'. State the lighting conditions as fact

Current light description: {cur_light_description}
        

2. Brightest region identification algorithm

We use the following algorithm to locate the dominant light sources in an HDR envmap:

  1. Read the HDR envmaps
  2. Compute luminance per pixel: L = 0.2126 R + 0.7152 G + 0.0722 B.
  3. Threshold the luminance to find bright regions. If no connected components are found, iteratively lower the threshold (divide by sqrt(2)) until areas appear or a lower bound (sqrt(2)/2) is reached.
  4. Label connected components (connectivity=2). For each connected component, compute:
    • area (pixel count)
    • total flux (sum of luminance over the component)
    • peak brightness (maximum luminance inside the component) and the pixel coordinates of that peak
  5. Filter out tiny components below a min_area_pixels (20), but keep at least one candidate if nothing larger exists.
  6. Rank remaining candidates by total flux (importance) and select the top-N (N=4) light sources.
  7. Convert the pixel coordinates of each selected light into spherical coordinates:
    • u = x / width, v = y / height
    • phi = (u - 0.5) * 2π (azimuth, 0 = front)
    • theta = (0.5 - v) * π (elevation, 0 = horizon)
  8. Produce a small structured record for each light with fields such as pixel coords, u/v, theta/phi (deg & rad), peak brightness, area, total flux, and a short position string (e.g., "up, on the right").

Angular-to-text mapping

For position descriptions, we map elevation (θ) and azimuth (φ) to short phrases as follows.

Vertical position mapping (θ)
Angular range (degrees) Description
(-90, -45] low down
(-45, -22.5] down
(-22.5, 22.5] (horizontal)
(22.5, 45] up
(45, 90] high up
Horizontal position mapping (φ)
Angular range (degrees) Description
(-22.5, 22.5] in the front
(22.5, 67.5] on the front-right
(67.5, 112.5] on the right
(112.5, 157.5] on the back-right
(157.5, 180] or (-180, -157.5] in the back
(-157.5, -112.5] on the back-left
(-112.5, -67.5] on the left
(-67.5, -22.5] on the front-left

Example output:


{
    "ulaval_outdoor": {
        "9C4A0006": {
            "000": [
                {
                "rank": 0,
                "total_flux": 86642.76549932986,
                "max_brightness": 13811.497809960936,
                "area_size": 4926,
                "u": 0.99609375,
                "v": 0.1875,
                "pixel_x": 1020,
                "pixel_y": 96,
                "theta_deg": 56.25,
                "phi_deg": 178.59375,
                "theta_rad": 0.9817477042468103,
                "phi_rad": 3.117048960983623,
                "position_description": "High up, in the back"
                },
                ...
            ]
        }
    }
}
    

3. Example descriptions of different lengths

Outdoor scene example (the RGB image and its corresponding envmap)

Outdoor scene with sunlight over water Outdoor scene with sunlight over water
One-paragraph description:

The primary natural light source is the sun, positioned high in the sky to the right, casting a bright and direct light with a cool blue hue. There are no visible artificial lights in the scene. The overall color of the scene is a mix of cool blues and greens, reflecting the sunlight on the water and the surrounding landscape.
        
One-sentence summary:

A bright sun from above to the right illuminates the scene with a cool blue hue, reflecting off the water and casting a serene light on the landscape.
        
Few-words summary:

Bright sun from the upper right, cool blue hue, sunlight reflecting on water, cool blues and greens.
        

Indoor scene example (the RGB image and its corresponding envmap)

Indoor scene Indoor scene
One-paragraph description:

A bright window on the left side of the scene provides natural light, casting a warm yellow hue across the room. A recessed strip light on the ceiling, positioned above and slightly to the right, emits a soft, warm light, contributing to the overall illumination. The scene has a warm, yellowish color tone, enhancing the cozy atmosphere.        
One-sentence summary:

A bright window on the left and a recessed strip light on the ceiling provide warm, yellowish illumination, creating a cozy atmosphere.
        
Few-words summary:

Bright window on the left, recessed strip light on the ceiling, warm yellow hue, cozy atmosphere.
        

4. Text encoder (Qwen3) instructions


You are an embedding model. Encode the scene lighting description for similarity search and image generation conditioning. The embeddings must capture 
the position (left, right, above, back, etc.) and the color of the dominant light sources (very important). And include the overall brightness, color temperature, and mood of the scene.
        

5. Envmap generation pipeline

We fine-tune Stable Diffusion 3.5 Medium to output an LDR environment map and repurpose its text-conditioning branch to accept our lighting embedding.

Envmap generation