UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling

Tan, Kaiyuan; Shen, Yingying; Zhu, Ziyue; Tu, Mingfei; Zhu, Haohui; Sun, Haiyang; Wang, Bing; Chen, Guang; Ye, Hangjun

Kaiyuan Tan, Yingying Shen, Ziyue Zhu, Mingfei Tu, Haohui Zhu, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 21849-21859

Abstract

Dynamic driving scene modeling is critical for autonomous driving simulation and closed-loop learning. While recent feed-forward methods offer fast inference through data-driven priors, they struggle with long-range driving sequences due to quadratic complexity in sequence length and restrictive assumptions for dynamic objects. We propose UFO, a novel recurrent paradigm that combines the strengths of optimization-based and feed-forward methods for efficient long-range 4D modeling. Our approach maintains a persistent 4D scene representation composed of scene tokens that are progressively refined as new frames arrive, enabling future observations to correct earlier uncertain predictions. A visibility-based filtering mechanism exploits the locality of 3D-to-pixel correspondence, reducing complexity from quadratic to near-linear in sequence length. For dynamic objects, we introduce object pose-guided modeling with learned lifespans, enabling complex long-range motion without kinematic assumptions. Experiments on the Waymo Open Dataset demonstrate that UFO significantly outperforms both per-scene optimization and feed-forward baselines across 2s, 8s, and zero-shot 16s sequences.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Tan_2026_CVPR, author = {Tan, Kaiyuan and Shen, Yingying and Zhu, Ziyue and Tu, Mingfei and Zhu, Haohui and Sun, Haiyang and Wang, Bing and Chen, Guang and Ye, Hangjun}, title = {UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {21849-21859} }