Position: Prospective of Autonomous Driving - Multimodal LLMs World Models Embodied Intelligence AI Alignment and Mamba

Ma, Yunsheng; Ye, Wenqian; Cui, Can; Zhang, Haiming; Xing, Shuo; Ke, Fucai; Wang, Jinhong; Miao, Chenglin; Chen, Jintai; Rezatofighi, Hamid; Li, Zhen; Zheng, Guangtao; Zheng, Chao; He, Tianjiao; Chandraker, Manmohan; Yaman, Burhaneddin; Ye, Xin; Zhao, Hang; Cao, Xu

Yunsheng Ma, Wenqian Ye, Can Cui, Haiming Zhang, Shuo Xing, Fucai Ke, Jinhong Wang, Chenglin Miao, Jintai Chen, Hamid Rezatofighi, Zhen Li, Guangtao Zheng, Chao Zheng, Tianjiao He, Manmohan Chandraker, Burhaneddin Yaman, Xin Ye, Hang Zhao, Xu Cao; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 1010-1026

Abstract

With the emergence of Generative AI multimodal AI systems that leverage foundation models are beginning to demonstrate enormous potential for perceiving the real world collecting new data making decisions and using tools like humans. In recent years the use of Large Language Models and World Models in autonomous driving has received widespread attention. However despite their enormous potential there is still a lack of comprehensive understanding regarding the key challenges opportunities and future applications of these new foundation models in driving systems. In this paper we provide an outlook on this field summarizing existing methods and exploring their limitations. In addition we further discuss the applicability of emerging approaches such as Reinforcement Learning from Human Feedback and Mamba for applications in autonomous driving. Finally we highlight open questions and offer insights into promising directions for future research. This paper is part of a living document that will be updated based on the LLVM-AD workshop series to reflect the latest developments in the field.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Ma_2025_WACV, author = {Ma, Yunsheng and Ye, Wenqian and Cui, Can and Zhang, Haiming and Xing, Shuo and Ke, Fucai and Wang, Jinhong and Miao, Chenglin and Chen, Jintai and Rezatofighi, Hamid and Li, Zhen and Zheng, Guangtao and Zheng, Chao and He, Tianjiao and Chandraker, Manmohan and Yaman, Burhaneddin and Ye, Xin and Zhao, Hang and Cao, Xu}, title = {Position: Prospective of Autonomous Driving - Multimodal LLMs World Models Embodied Intelligence AI Alignment and Mamba}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {1010-1026} }