-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Luo_2026_CVPR, author = {Luo, Yuechen and Li, Fang and Chen, Qimao and Xu, Shaoqing and Liu, Jiaxin and Song, Ziying and Yang, Zhi-xin and Wen, Fuxi}, title = {Unleashing VLA Potentials in Autonomous Driving via Explicit Learning from Failures}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {24833-24842} }
Unleashing VLA Potentials in Autonomous Driving via Explicit Learning from Failures
Abstract
Vision-Language-Action (VLA) models for autonomous driving often hit a performance plateau during Reinforcement Learning (RL) optimization. This stagnation arises from exploration capabilities constrained by previous Supervised Fine-Tuning (SFT), leading to "persistent failures" in long-tail scenarios. In these critical situations, all explored actions yield a zero-value driving score. This information-sparse reward signals a failure, yet fails to identify its root cause--whether it is due to incorrect planning, flawed reasoning, or poor trajectory execution. To address this limitation, we propose **VLA** with **E**xplicit **L**earning from **F**ailures (**ELF-VLA**), a framework that augments RL with structured diagnostic feedback. Instead of relying on a vague scalar reward, our method produces detailed, interpretable reports that identify the specific failure mode. The VLA policy then leverages this explicit feedback to generate a **Feedback-Guided Refinement**. By injecting these corrected, high-reward samples back into the RL training batch, our approach provides a targeted gradient, which enables the policy to solve critical scenarios that unguided exploration cannot. Extensive experiments demonstrate that our method unlocks the latent capabilities of VLA models, achieving state-of-the-art (SOTA) performance on the public Navsim benchmark for overall PDMS, EPDMS score and high-level planning accuracy.
Related Material

