InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, Jiahui Huang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 27272-27283

Abstract


We present InfiniCube, a scalable and controllable method to generate unbounded and dynamic 3D driving scenes with high fidelity.Previous methods for scene generation are constrained either by their applicability to indoor scenes or by their lack of controllability.In contrast, we take advantage of recent advances in 3D and video generative models to achieve large dynamic scene generation that allows flexible controls through HD maps, vehicle bounding boxes, and text descriptions.First, we construct a map-conditioned 3D voxel generative model to unleash its power for unbounded voxel world generation. Then, we re-purpose a video model and ground it on the voxel world through a set of pixel-aligned guidance buffers, synthesizing a consistent appearance on long-video generation for large-scale scenes.Finally, we propose a fast feed-forward approach that employs both voxel and pixel branches to lift videos to dynamic 3D Gaussians with controllable objects.Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness of our model design. Code will be released upon acceptance.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Lu_2025_ICCV, author = {Lu, Yifan and Ren, Xuanchi and Yang, Jiawei and Shen, Tianchang and Wu, Zhangjie and Gao, Jun and Wang, Yue and Chen, Siheng and Chen, Mike and Fidler, Sanja and Huang, Jiahui}, title = {InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {27272-27283} }