"What's Happening"- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles

Luo, Xuewen; Ding, Fan; Panda, Rishikesh; Chen, Ruiqi; Loo, Junnyong; Zhang, Shuyun

Xuewen Luo, Fan Ding, Rishikesh Panda, Ruiqi Chen, Junnyong Loo, Shuyun Zhang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 1163-1170

Abstract

Public distrust of self-driving cars is growing. Studies emphasize the need for interpreting the behavior of these vehicles to passengers to promote trust in autonomous systems. Interpreters can enhance trust by improving transparency and reducing perceived risk. However current solutions often lack a human-centric approach to integrating multimodal interpretations. This paper introduces a novel Human-centered Multimodal Interpreter (HMI) system that leverages human preferences to provide visual textual and auditory feedback. The system combines a visual interface with Bird's Eye View (BEV) map and text display along with voice interaction using a fine-tuned large language model (LLM). Our user study involving diverse participants demonstrated that the HMI system significantly boosts passenger trust in AVs increasing average trust levels by over 8% with trust in ordinary environments rising by up to 30%. These results underscore the potential of the HMI system to improve the acceptance and reliability of autonomous vehicles by providing clear real-time and context-sensitive explanations of vehicle actions.

Related Material

[pdf]

[bibtex]

@InProceedings{Luo_2025_WACV, author = {Luo, Xuewen and Ding, Fan and Panda, Rishikesh and Chen, Ruiqi and Loo, Junnyong and Zhang, Shuyun}, title = {''What's Happening''- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {1163-1170} }