OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

Xing, Shuo; Qian, Chengyuan; Wang, Yuping; Hua, Hongyuan; Tian, Kexin; Zhou, Yang; Tu, Zhengzhong

Shuo Xing, Chengyuan Qian, Yuping Wang, Hongyuan Hua, Kexin Tian, Yang Zhou, Zhengzhong Tu; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 1001-1009

Abstract

Since the advent of Multimodal Large Language Models (MLLMs) they have made a significant impact across a wide range of real-world applications particularly in Autonomous Driving (AD). Their ability to process complex visual data and reason about intricate driving scenarios has paved the way for a new paradigm in end-to-end AD systems. However the progress of developing end-to-end models for AD has been slow as existing fine-tuning methods demand substantial resources including extensive computational power large-scale datasets and significant funding. Drawing inspiration from recent advancements in inference computing we propose OpenEMMA an open-source end-to-end framework based on MLLMs. By incorporating the Chain-of-Thought reasoning process OpenEMMA achieves significant improvements compared to the baseline when leveraging a diverse range of MLLMs. Furthermore OpenEMMA demonstrates effectiveness generalizability and robustness across a variety of challenging driving scenarios offering a more efficient and effective approach to autonomous driving. We release all the codes in https://github.com/taco-group/OpenEMMA.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Xing_2025_WACV, author = {Xing, Shuo and Qian, Chengyuan and Wang, Yuping and Hua, Hongyuan and Tian, Kexin and Zhou, Yang and Tu, Zhengzhong}, title = {OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {1001-1009} }