-
[pdf]
[supp]
[bibtex]@InProceedings{Choudhary_2025_WACV, author = {Choudhary, Neha and Goyal, Poonam and Siwatch, Devashish and Chandak, Atharva and Mahajan, Harsh and Khurana, Varun and Kumar, Yaman}, title = {AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {5812-5821} }
AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements
Abstract
In the rapidly evolving landscape of digital marketing effective customer engagement through advertisements is crucial for brands. Thus computational understanding of ads is pivotal for recommendation authoring and customer behaviour simulation. Despite advancements in knowledge-guided visual-question-answering (VQA) models existing frameworks often lack domain-specific responses and suffer from a dearth of benchmark datasets for advertisements. To address this gap we introduce ADVQA the first dataset for ad-related VQA sourced from Facebook and X (twitter) which facilitates further research in ad comprehension. It comprises open-ended questions and detailed context obtained automatically from web articles. Moreover we present AdQuestA a novel multimodal framework for knowledge-guided open-ended question-answering tailored to advertisements. AdQuestA leverages a Retrieval Augmented Generation (RAG) to obtain question-aware ad context as explicit knowledge and image-grounded implicit knowledge effectively exploiting inherent relationships for reasoning. Extensive experiments corroborate its efficacy yielding state-of-the-art performance on the ADVQA dataset even surpassing 10X larger models such as GPT-4 on this task. Our framework not only enhances understanding of ad content but also advances the broader landscape of knowledge-guided VQA models.
Related Material