Glimpse of MCQ based VQA in Road & Traffic Scenarios

Parthasarathy, Ambarish; R, Athira Krishnan; BG, Sumukha

Ambarish Parthasarathy, Athira Krishnan R, Sumukha BG; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 997-1000

Abstract

Multi-modal models have brought a boon to the community over the last decade. The challenge posts a VQA problem in MCQ answering in road/traffic scenarios. We have attempted this as a supervised classification task. In this exercise we have explored various models from Transformer based multi-modal models to recent (Vision Language Models)VLMs. Even though we have experimented with time and compute constraints the results show that the models could enhance the capability if exposed to longer iterations. We have also compared the model sizes considering the limited target edge compute. Codes are available at: https://github.com/Athirakr94/VQA.git

Related Material

[pdf]

[bibtex]

@InProceedings{Parthasarathy_2025_WACV, author = {Parthasarathy, Ambarish and R, Athira Krishnan and BG, Sumukha}, title = {Glimpse of MCQ based VQA in Road \& Traffic Scenarios}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {997-1000} }