VLAAD: Vision and Language Assistant for Autonomous Driving

SungYeon Park, MinJae Lee, JiHyuk Kang, Hahyeon Choi, Yoonah Park, Juhwan Cho, Adam Lee, DongKyu Kim; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 980-987

Abstract


While interpretable decision-making is pivotal in autonomous driving, research integrating natural language models remains a relatively untapped. To address this, we introduce a multi-modal instruction tuning dataset that facilitates language models in learning visual instructions across diverse driving scenarios. This dataset encompasses three primary tasks: conversation, detailed description, and complex reasoning. Capitalizing on this dataset, we present a multi-modal LLM driving assistant named VLAAD. After fine-tuned from our instruction-following dataset, VLAAD demonstrates proficient interpretive capabilities across a spectrum of driving situations. We open our work, dataset, and model, to public on github. (https://github.com/sungyeonparkk/vision-assistant-for-driving)

Related Material


[pdf]
[bibtex]
@InProceedings{Park_2024_WACV, author = {Park, SungYeon and Lee, MinJae and Kang, JiHyuk and Choi, Hahyeon and Park, Yoonah and Cho, Juhwan and Lee, Adam and Kim, DongKyu}, title = {VLAAD: Vision and Language Assistant for Autonomous Driving}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2024}, pages = {980-987} }