DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
Synthesizing an image from a given text encounters two major challenges: the integrity of images and the consistency of text-image pairs. Although many decent performances have been achieved, two crucial problems are still not considered adequately. (i) The object frame is prone to deviate or collapse, making subsequent refinement unavailable. (ii) The non-target regions of the image are affected by text which is highly conveyed through phrases, instead of words. Current methods barely employ the word-level clue, leaving coherent implication in phrases broken. To tackle the issues, we propose DAC-GAN, a Dual Auxiliary Consistency Generative Adversarial Network(DAC-GAN). Specifically, we simplify the generation by a single-stage structure with dual auxiliary modules. (1) Class-Aware skeleton Consistency(CAC) module retains the integrity of image by exploring additional supervision from prior knowledge and (2) Multi-label-Aware Consistency(MAC) module strengthens the alignment of text-image pairs at phrase-level. Comprehensive experiments on two widely-used datasets show that DAC-GAN can maintain the integrity of the target and enhance the consistency of text-image pairs.