An Unified Framework for Language Guided Image Completion

Kim, Jihyun; Jeong, Seong-Hun; Kong, Kyeongbo; Kang, Suk-Ju

Jihyun Kim, Seong-Hun Jeong, Kyeongbo Kong, Suk-Ju Kang; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2568-2578

Abstract

Image completion is a research field which aims to generate visual contents for unknown regions of an image. Image outpainting and wide-range image blending, which we refer to as extensive painting, are considered challenging because compared to the large unknown regions, relatively less context is provided. Some recent studies have tried to decrease the complexity of extensive painting by generating image hints for the missing regions. In this paper, we introduce a novel modality of hints, the natural language. Moreover, we propose a Captioning-based Extensive Painting (CEP) module, which combines models for two different multi-modal tasks: image captioning and text-guided image completion. In order to generate appropriate captions for masked images, the image captioning model is optimized using self-critical sequence training (SCST) method with random masks. The biggest benefit of our methodology is the accessibility to well-designed image captioning and text-guided image manipulation models such as OFA and GLIDE without the need for additional architectural changes. In evaluation, our model demonstrates remarkable performance even with complicated image datasets both quantitatively and qualitatively.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Kim_2023_WACV, author = {Kim, Jihyun and Jeong, Seong-Hun and Kong, Kyeongbo and Kang, Suk-Ju}, title = {An Unified Framework for Language Guided Image Completion}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {2568-2578} }