Figure Captioning with Relation Maps for Reasoning

Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Ryan Rossi; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1537-1545


Figures, such as line plots, pie charts, bar charts, are widely used to convey important information in a concise format. In this work, we investigate the problem of figure caption generation where the goal is to automatically generate a natural language description for a given figure. While natural image captioning has been studied extensively, figure captioning has received relatively little attention and remains a challenging problem. A successful solution to this task has many potential applications, such as: 1) adding captions to the output of a visualization tool; 2) summarizing documents with a number of figures with or without proper captions; 3) improving user experience by allowing figure content to be accessible to those with visual impairment. To solve this problem, we collect a dataset FigCAP for testing the capability of generating captions, and propose a captioning framework with novel attention models. In order to solve the exposure bias issue, we further train the captioning model with sequence-level policy based on reinforcement learning, which directly optimizes evaluation metrics. Extensive experiments show that our proposed models outperform strong image captioning baselines, thus demonstrating a significant potential for automatic generating captions for figures.

Related Material

author = {Chen, Charles and Zhang, Ruiyi and Koh, Eunyee and Kim, Sungchul and Cohen, Scott and Rossi, Ryan},
title = {Figure Captioning with Relation Maps for Reasoning},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}