-
[pdf]
[arXiv]
[bibtex]@InProceedings{Xing_2025_WACV, author = {Xing, Shuo and Qian, Chengyuan and Wang, Yuping and Hua, Hongyuan and Tian, Kexin and Zhou, Yang and Tu, Zhengzhong}, title = {OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {1001-1009} }
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
Abstract
Since the advent of Multimodal Large Language Models (MLLMs) they have made a significant impact across a wide range of real-world applications particularly in Autonomous Driving (AD). Their ability to process complex visual data and reason about intricate driving scenarios has paved the way for a new paradigm in end-to-end AD systems. However the progress of developing end-to-end models for AD has been slow as existing fine-tuning methods demand substantial resources including extensive computational power large-scale datasets and significant funding. Drawing inspiration from recent advancements in inference computing we propose OpenEMMA an open-source end-to-end framework based on MLLMs. By incorporating the Chain-of-Thought reasoning process OpenEMMA achieves significant improvements compared to the baseline when leveraging a diverse range of MLLMs. Furthermore OpenEMMA demonstrates effectiveness generalizability and robustness across a variety of challenging driving scenarios offering a more efficient and effective approach to autonomous driving. We release all the codes in https://github.com/taco-group/OpenEMMA.
Related Material