A Latent-Centric Perspective on World Models for Autonomous Driving: Taxonomy, Evaluation, and Challenges

Rongxiang Zeng, Yongqi Dong, Zhida Shao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 4510-4519

Abstract


Latent representations have become a core substrate for world models in autonomous driving, yet a unified framework linking representation design to closed-loop safety is still missing. This paper presents a latent-centric taxonomy that organizes recent world models into three paradigms: spatiotemporal neural simulation, generative data synthesis and augmentation, and latent decision-making with cognitive reasoning. We compare these paradigms along three method-level attributes, latent structure, dynamics model, and input-output interface, and highlight cross-cutting design considerations that shape rollout fidelity, closed-loop suitability, and deployment robustness. To characterize the mismatch between open-loop predictive accuracy and closed-loop safety, we introduce two diagnostic metrics: Closed-loop Safety Gap (CSG) and Deliberation Cost (DC). We further identify five recurring bottlenecks in current research: extended-horizon drift, real-time computational constraints, fragile domain transfer, limited causal interpretability, and scarce safety-critical data. Overall, we argue that future progress depends on moving beyond perceptual realism alone toward world models that are decision-relevant, computationally tractable, and verifiable under closed-loop interaction.

Related Material


[pdf]
[bibtex]
@InProceedings{Zeng_2026_CVPR, author = {Zeng, Rongxiang and Dong, Yongqi and Shao, Zhida}, title = {A Latent-Centric Perspective on World Models for Autonomous Driving: Taxonomy, Evaluation, and Challenges}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2026}, pages = {4510-4519} }