Auto QA: The Question Is Not Only What, but Also Where

Sumit Kumar, Badri N. Patro, Vinay P. Namboodiri; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2022, pp. 272-281

Abstract


Visual Question Answering can be a functionally relevant task if purposed as such. In this paper, we aim to investigate and evaluate its efficacy in terms of localization-based question answering. We do this specifically in the context of autonomous driving where this functionality is important. To achieve our aim, we provide a new dataset, Auto-QA. Our new dataset is built over the Argoverse dataset and provides a truly multi-modal setting with seven views per frame and point-cloud LIDAR data being available for answering a localization-based question. We contribute localized attention adaptations of most popular VQA baselines and evaluate them on this task. We also provide joint point-cloud and image-based baselines that perform well on this task. An additional evaluation that we perform is to analyse whether the attention module is accurate or not for the image-based VQA baselines. To summarize, through this work we thoroughly analyze the localization abilities through visual question answering for autonomous driving and provide a new benchmark task for the same. Our best joint baseline model achieves a useful 74.8% accuracy on this task. We release our dataset and source code for our baseline modules in the following webpage: \url https://temporaryprojectpage.github.io/AUTO-QA/

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Kumar_2022_WACV, author = {Kumar, Sumit and Patro, Badri N. and Namboodiri, Vinay P.}, title = {Auto QA: The Question Is Not Only What, but Also Where}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2022}, pages = {272-281} }