SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Aleksey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 628-639

Abstract


We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Bokhovkin_2025_CVPR, author = {Bokhovkin, Aleksey and Meng, Quan and Tulsiani, Shubham and Dai, Angela}, title = {SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {628-639} }