Efficient Counterfactual Debiasing for Visual Question Answering

Camila Kolling, Martin More, Nathan Gavenski, Eduardo Pooch, Otávio Parraga, Rodrigo C. Barros; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 3001-3010


Despite the success of neural architectures for Visual Question Answering (VQA), several recent studies have shown that VQA models are mostly driven by superficial correlations that are learned by exploiting undesired priors within training datasets. They often lack sufficient image grounding or tend to overly-rely on textual information, failing to capture knowledge from the images. This affects their generalization to test sets with slight changes in the distribution of facts. To address such an issue, some bias mitigation methods have relied on new training procedures that are capable of synthesizing counterfactual samples by masking critical objects within the images, and words within the questions, while also changing the corresponding ground truth. We propose a novel model-agnostic counterfactual training procedure, namely Efficient Counterfactual Debiasing (ECV), in which we introduce a new negative answer-assignment mechanism that exploits the probability distribution of the answers based on their frequencies, as well as an improved counterfactual sample synthesizer. Our experiments demonstrate that ECV is a simple, computationally-efficient counterfactual sample-synthesizer training procedure that establishes itself as the new state-of-the-art for unbiased VQA.

Related Material

@InProceedings{Kolling_2022_WACV, author = {Kolling, Camila and More, Martin and Gavenski, Nathan and Pooch, Eduardo and Parraga, Ot\'avio and Barros, Rodrigo C.}, title = {Efficient Counterfactual Debiasing for Visual Question Answering}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2022}, pages = {3001-3010} }