-
[pdf]
[supp]
[bibtex]@InProceedings{Chennaka_2026_WACV, author = {Chennaka, Sairam and Nidamanuri, Jaswanth}, title = {Trust-Guided Multimodal LLM Integration with Reinforcement Learning for Autonomous Driving}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {March}, year = {2026}, pages = {1780-1788} }
Trust-Guided Multimodal LLM Integration with Reinforcement Learning for Autonomous Driving
Abstract
Large language models (LLMs) offer semantic reasoning capabilities valuable for autonomous driving, but their output reliability remains uncertain and scenario-dependent. This work investigates integrating multimodal LLMs with reinforcement learning by introducing a learned trust gating mechanism that estimates LLM output reliability from sensor state. The proposed approach combines a three-stage LLM reasoning pipeline (perception - planning - control) with transformer-based multimodal sensor fusion. A comprehensive ablation study across 2,250 episodes (15 configurations, 3 random seeds, and 10 diverse scenarios) demonstrates that LLM-guided reward shaping achieves 29.3% performance improvement and 47.8% collision reduction compared to baseline RL. Trust floor sensitivity analysis justifies tmin = 0.3 as optimal, while fallback quantification reveals that rule-based systems handle 45% of decisions in perception-degraded scenarios. Reward attribution analysis confirms LLM alignment contributes 11.8% improvement independent of trust gating. Analysis reveals LLMs provide substantial value for high-level semantic reasoning (scene understanding: +46% for pedestrian crossing) but limited benefit for routine control tasks (+10% for highway cruise). The learned trust-gating mechanism reduces output variance by 74%, indicating that principled confidence estimation enables safer integration of external semantic guidance in safety-critical control.
Related Material
