Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing

Goonmeet Bajaj, Bortik Bandyopadhyay, Daniel Schmidt, Pranav Maneriker, Christopher Myers, Srinivasan Parthasarathy; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 386-387

Abstract


Traditional Visual Question Answering (VQA) datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need to improve the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases, while aiming for enhanced visual understanding. However, it is unclear whether any latent patterns exist to quantify and explain these failures. As an initial step towards better quantifying our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs. Each KG describes the reasoning abilities needed to arrive at a resolution, and failure to resolve gaps indicates an absence of the required reasoning ability. After identifying KGs for each question, we examine the skew in the distribution of questions for each KG. We then introduce a targeted question generation model to reduce this skew, which allows us to generate new types of questions for an image.

Related Material


[pdf]
[bibtex]
@InProceedings{Bajaj_2020_CVPR_Workshops,
author = {Bajaj, Goonmeet and Bandyopadhyay, Bortik and Schmidt, Daniel and Maneriker, Pranav and Myers, Christopher and Parthasarathy, Srinivasan},
title = {Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}
}