Discriminability Objective for Training Descriptive Captions

Ruotian Luo, Brian Price, Scott Cohen, Gregory Shakhnarovich; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6964-6974

Abstract


One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation. By incorporating into the captioning training objective a loss component directly related to ability (by a machine) to disambiguate image/caption matches, we obtain systems that produce much more discriminative caption, according to human evaluation. Remarkably, our approach leads to improvement in other aspects of generated captions, reflected by a battery of standard scores such as BLEU, SPICE etc. Our approach is modular and can be applied to a variety of model/loss combinations commonly proposed for image captioning.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Luo_2018_CVPR,
author = {Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
title = {Discriminability Objective for Training Descriptive Captions},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}