Towards Text-guided 3D Scene Composition

Zhang, Qihang; Wang, Chaoyang; Siarohin, Aliaksandr; Zhuang, Peiye; Xu, Yinghao; Yang, Ceyuan; Lin, Dahua; Zhou, Bolei; Tulyakov, Sergey; Lee, Hsin-Ying

Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 6829-6838

Abstract

We are witnessing significant breakthroughs in the technology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes however remains very challenging as a scene contains multiple 3D objects diverse and scattered. In this work we introduce SceneWiz3D - a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation - explicit for objects and implicit for scenes. Remarkably an object being represented explicitly can be either generated from text using conventional text-to-3D approaches or provided by users. To configure the layout of the scene and automatically place objects we apply the Particle Swarm Optimization technique during the optimization process. Furthermore it is difficult for certain parts of the scene (e.g. corners occlusion) to receive multi-view supervision leading to inferior geometry. We incorporate an RGBD panorama diffusion model to mitigate it resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches enabling the generation of detailed and view-consistent 3D scenes. Our project website is at https://zqh0253.github.io/SceneWiz3D.\\

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Zhang_2024_CVPR, author = {Zhang, Qihang and Wang, Chaoyang and Siarohin, Aliaksandr and Zhuang, Peiye and Xu, Yinghao and Yang, Ceyuan and Lin, Dahua and Zhou, Bolei and Tulyakov, Sergey and Lee, Hsin-Ying}, title = {Towards Text-guided 3D Scene Composition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {6829-6838} }