Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning

Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2942-2952

Abstract


Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model. However, since web-crawled data contains image-text pairs that are aligned at different levels, the inherent noises (e.g., misaligned pairs) make it difficult to learn a precise captioning model. While the filtering strategy can effectively remove noisy data, it leads to a decrease in learnable knowledge and sometimes brings about a new problem of data deficiency. To take the best of both worlds, we propose a Noise-aware Captioning (NoC) framework, which learns rich knowledge from the whole web-crawled data while being less affected by the noises. This is achieved by the proposed alignment-level-controllable captioner, which is learned using alignment levels of the image-text pairs as a control signal during training. The alignment-level-conditioned training allows the model to generate high-quality captions by simply setting the control signal to the desired alignment level at inference time. An in-depth analysis shows the effectiveness of our framework in handling noise. With two tasks of zero-shot captioning and text-to-image retrieval using generated captions (i.e., self-retrieval), we also demonstrate our model can produce high-quality captions in terms of descriptiveness and distinctiveness. The code is available at https://github.com/kakaobrain/noc.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Kang_2023_ICCV, author = {Kang, Wooyoung and Mun, Jonghwan and Lee, Sungjun and Roh, Byungseok}, title = {Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {2942-2952} }