Visual Question Generation as Dual Task of Visual Question Answering

Li, Yikang; Duan, Nan; Zhou, Bolei; Chu, Xiao; Ouyang, Wanli; Wang, Xiaogang; Zhou, Ming

Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang, Ming Zhou; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6116-6124

Abstract

Visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, but they are usually explored separately despite their intrinsic complementary relationship. In this paper, we propose an end-to-end unified model, the Invertible Question Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the VQA performance. With our proposed invertible bilinear fusion module and parameter sharing scheme, our iQAN can accomplish VQA and its dual task VQG simultaneously. By jointly trained on two tasks with our proposed dual regularizers~(termed as Dual Training), our model has a better understanding of the interactions among images, questions and answers. After training, iQAN can take either question or answer as input, and output the counterpart. Evaluated on the CLEVR and VQA2 datasets, our iQAN improves the top-1 accuracy of the prior art MUTAN VQA method by 1.33% and 0.88% (absolute increase). We also show that our proposed dual training framework can consistently improve model performances of many popular VQA architectures.

Related Material

[pdf] [arXiv] [video]

[bibtex]

@InProceedings{Li_2018_CVPR,
author = {Li, Yikang and Duan, Nan and Zhou, Bolei and Chu, Xiao and Ouyang, Wanli and Wang, Xiaogang and Zhou, Ming},
title = {Visual Question Generation as Dual Task of Visual Question Answering},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}