Grounded, Controllable and Debiased Image Completion With Lexical Semantics

Zhang, Shengyu; Jiang, Tan; Huang, Qinghao; Tan, Ziqi; Kuang, Kun; Zhao, Zhou; Tang, Siliang; Yu, Jin; Yang, Hongxia; Yang, Yi; Wu, Fei

Shengyu Zhang, Tan Jiang, Qinghao Huang, Ziqi Tan, Kun Kuang, Zhou Zhao, Siliang Tang, Jin Yu, Hongxia Yang, Yi Yang, Fei Wu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 1748-1751

Abstract

In this paper, we present an approach, namely Lexical Semantic Image Completion (LSIC), that may have potential applications in art, design, and heritage conservation, among several others. Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge. To permit both grounded and controllable completion process, we advocate generating results faithful to both visual and lexical semantic context, i.e., the description of leaving holes or blank regions in the image (e.g., hole description). One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context and translating across different modalities. We devise multi-grained reasoning blocks to address this challenge. Another challenge relates to the unimodal biases, which occurs when the model generates plausible results without using the textual description. We devise an unsupervised unpaired-creation learning path that explicitly performs counterfactual thinking, i.e., what the complete image would be if given an unpaired text description to the incomplete image. A cycle consistency loss is devised to guarantee counterfactual faithfulness. We conduct extensive quantitative and qualitative experiments that reveal the strengths of LSIC in being grounded, controllable, and debiased.

Related Material

[pdf]

[bibtex]

@InProceedings{Zhang_2021_CVPR, author = {Zhang, Shengyu and Jiang, Tan and Huang, Qinghao and Tan, Ziqi and Kuang, Kun and Zhao, Zhou and Tang, Siliang and Yu, Jin and Yang, Hongxia and Yang, Yi and Wu, Fei}, title = {Grounded, Controllable and Debiased Image Completion With Lexical Semantics}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {1748-1751} }