-
[pdf]
[supp]
[bibtex]@InProceedings{Borhan_2026_CVPR, author = {Borhan, Uddin Md. and Raza, Arif and Lin, Zhiliang and Wang, Lu and Li, Jianqiang and Chen, Jie}, title = {Reliable Policy Transfer for Safety-Aware End-to-End Driving with Deep Reinforcement Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {32134-32143} }
Reliable Policy Transfer for Safety-Aware End-to-End Driving with Deep Reinforcement Learning
Abstract
End-to-End (E2E) Reinforcement Learning (RL) for autonomous driving still struggles with safety and generalization under distribution shift, as perception-heavy encoders, sparse rewards, and ad hoc uncertainty handling yield brittle closed-loop behavior. This work introduces a unified Deep RL (DRL) framework built around a control-layer reliability interface where the same uncertainty signal informs relational attention, gates policy entropy, and regularizes transfer alignment. An ego-centric relational graph encodes agent influence via uncertainty-weighted attention over kinematics, lane geometry, and semantics, producing a compact control state. A multi-objective differentiable reward shapes safety, progress, and comfort with an uncertainty term. Aleatoric and epistemic uncertainty, captured through per-edge heteroscedastic variance and a critic ensemble, modulate policy entropy for risk-aware exploration. A causal-semantic transfer objective aligns actions, attention, and uncertainty statistics across domains with meta-learned initialization for few-shot adaptation. In closed-loop urban driving across varied towns, traffic, and weather, the framework improves success rate, reduces infractions, and achieves higher time-to-conflict combined with lower lateral deviation over strong baselines.
Related Material

