Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning

Shi, Jiahe; Li, Yali; Wang, Shengjin

Jiahe Shi, Yali Li, Shengjin Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2187-2196

Abstract

Human-oriented image captioning with both high diversity and accuracy is a challenging task in vision+language modeling. The reinforcement learning (RL) based frameworks promote the accuracy of image captioning, yet seriously hurt the diversity. In contrast, other methods based on variational auto-encoder (VAE) or generative adversarial network (GAN) can produce diverse yet less accurate captions. In this work, we devote our attention to promote the diversity of RL-based image captioning. To be specific, we devise a partial off-policy learning scheme to balance accuracy and diversity. First, we keep the model exposed to varied candidate captions by sampling from the initial state before RL launched. Second, a novel criterion named max-CIDEr is proposed to serve as the reward for promoting diversity. We combine the above-mentioned off-policy strategy with the on-policy one to moderate the exploration effect, further balancing the diversity and accuracy for human-like image captioning. Experiments show that our method locates the closest to human performance in the diversity-accuracy space, and achieves the highest Pearson correlation as 0.337 with human performance.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Shi_2021_ICCV, author = {Shi, Jiahe and Li, Yali and Wang, Shengjin}, title = {Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {2187-2196} }