Our method tackles text-to-3D scene generation by first creating a panoramic image with a finetuned diffusion model, serving as geometric and stylistic prior. Relevant instances of objects are segmented, reconstructed in high-fidelity and placed in the background environment. The background is optimized for immersive viewing with a combination of 2D and 3D inpainting techniques. The resulting scenes are more immersive and have higher structural coherence under large camera offsets than existing methods, making them suited for applications such as editing and 3D content transfer.
Autumn park scene with people sitting on benches surrounded by colorful trees, storybook illustration style.
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting. ECCV 2024.
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation. SIGGRAPH 2025.
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. ICCV 2023.
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models. ArXiv 2025.