Agentic Retoucher for Text-To-Image Generation

Shen, Shaocheng; Liang, Jianfeng; Cai, Chunlei; Geng, Cong; Duan, Huiyu; Zhang, Xiaoyun; Hu, Qiang; Zhai, Guangtao

Shaocheng Shen, Jianfeng Liang, Chunlei Cai, Cong Geng, Huiyu Duan, Xiaoyun Zhang, Qiang Hu, Guangtao Zhai; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 29114-29125

Abstract

Text-to-image (T2I) diffusion models such as SDXL and FLUX have achieved impressive photorealism, yet small-scale distortions remain pervasive in limbs, face, text and so on. Existing refinement approaches either perform costly iterative re-generation or rely on vision-language models (VLMs) with weak spatial grounding, leading to semantic drift and unreliable local edits. To close this gap, we propose **Agentic Retoucher**, a hierarchical decision-driven framework that reformulates post-generation correction as a human-like *perception-reasoning-action* loop.Specifically, we design (1) a **perception agent** that learns contextual saliency for fine-grained distortion localization under text-image consistency cues, (2) a **reasoning agent** that performs human-aligned inferential diagnosis via progressive preference alignment, and (3) an **action agent** that adaptively plans localized inpainting guided by user preference. This design integrates perceptual evidence, linguistic reasoning, and controllable correction into a unified, self-corrective decision process. To enable fine-grained supervision and quantitative evaluation, we further construct **GenBlemish-27K**, a dataset of 6K T2I images with 27K annotated artifact regions across 12 categories.Extensive experiments demonstrate that Agentic Retoucher consistently outperforms state-of-the-art methods in perceptual quality, distortion localization and human preference alignment, establishing a new paradigm for self-corrective and perceptually reliable T2I generation.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Shen_2026_CVPR, author = {Shen, Shaocheng and Liang, Jianfeng and Cai, Chunlei and Geng, Cong and Duan, Huiyu and Zhang, Xiaoyun and Hu, Qiang and Zhai, Guangtao}, title = {Agentic Retoucher for Text-To-Image Generation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {29114-29125} }