-
[pdf]
[bibtex]@InProceedings{Park_2024_WACV, author = {Park, SungYeon and Lee, MinJae and Kang, JiHyuk and Choi, Hahyeon and Park, Yoonah and Cho, Juhwan and Lee, Adam and Kim, DongKyu}, title = {VLAAD: Vision and Language Assistant for Autonomous Driving}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2024}, pages = {980-987} }
VLAAD: Vision and Language Assistant for Autonomous Driving
Abstract
While interpretable decision-making is pivotal in autonomous driving, research integrating natural language models remains a relatively untapped. To address this, we introduce a multi-modal instruction tuning dataset that facilitates language models in learning visual instructions across diverse driving scenarios. This dataset encompasses three primary tasks: conversation, detailed description, and complex reasoning. Capitalizing on this dataset, we present a multi-modal LLM driving assistant named VLAAD. After fine-tuned from our instruction-following dataset, VLAAD demonstrates proficient interpretive capabilities across a spectrum of driving situations. We open our work, dataset, and model, to public on github. (https://github.com/sungyeonparkk/vision-assistant-for-driving)
Related Material