Selectively Extracting and Injecting Visual Attributes into Text-to-Image Models

Choi, Seunghwan; Yun, Jooyeol; Lee, Youngdo; Choo, Jaegul

Seunghwan Choi, Jooyeol Yun, Youngdo Lee, Jaegul Choo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 21976-21985

Abstract

Text-to-image models are increasingly utilized in design workflows, but articulating nuanced design intentions solely through text remains a challenge. This work proposes a method that extracts visual attributes from a reference image and injects them directly into the generation pipeline. Specifically, the method optimizes a text token to selectively represent the target attribute using a custom training prompt and two novel embeddings, termed distilled and residual. A wide range of attributes can be extracted through this approach, including shape, material, pose, and camera shot and angle. The effectiveness of the method is validated on various target attributes and text prompts drawn from a newly constructed dataset. The method outperforms existing approaches in selectively extracting and applying target attributes across diverse contexts. Ultimately, the proposed method enables intuitive and controllable text-to-image generation, streamlining the design process.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Choi_2026_CVPR, author = {Choi, Seunghwan and Yun, Jooyeol and Lee, Youngdo and Choo, Jaegul}, title = {Selectively Extracting and Injecting Visual Attributes into Text-to-Image Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {21976-21985} }