Webly Supervised Knowledge Embedding Model for Visual Reasoning

Wenbo Zheng, Lan Yan, Chao Gou, Fei-Yue Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12445-12454

Abstract


Visual reasoning between visual image and natural language description is a long-standing challenge in computer vision. While recent approaches offer a great promise by compositionality or relational computing, most of them are oppressed by the challenge of training with datasets containing only a limited number of images with ground-truth texts. Besides, it is extremely time-consuming and difficult to build a larger dataset by annotating millions of images with text descriptions that may very likely lead to a biased model. Inspired by the majority success of webly supervised learning, we utilize readily-available web images with its noisy annotations for learning a robust representation. Our key idea is to presume on web images and corresponding tags along with fully annotated datasets in learning with knowledge embedding. We present a two-stage approach for the task that can augment knowledge through an effective embedding model with weakly supervised web data. This approach learns not only knowledge-based embeddings derived from key-value memory networks to make joint and full use of textual and visual information but also exploits the knowledge to improve the performance with knowledge-based representation learning for applying other general reasoning tasks. Experimental results on two benchmarks show that the proposed approach significantly improves performance compared with the state-of-the-art methods and guarantees the robustness of our model against visual reasoning tasks and other reasoning tasks.

Related Material


[pdf]
[bibtex]
@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Wenbo and Yan, Lan and Gou, Chao and Wang, Fei-Yue},
title = {Webly Supervised Knowledge Embedding Model for Visual Reasoning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}