ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts

Petrov, Dmitry; Goyal, Pradyumn; Shivashok, Divyansh; Tao, Yuanming; Averkiou, Melinos; Kalogerakis, Evangelos

Dmitry Petrov, Pradyumn Goyal, Divyansh Shivashok, Yuanming Tao, Melinos Averkiou, Evangelos Kalogerakis; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 13305-13314

Abstract

We introduce ShapeWords, an approach for synthesizing images based on 3D shape guidance and text prompts.ShapeWords incorporates target 3D shape information within specialized tokens embedded together with the input text, effectively blending 3D shape awareness with textual context to guide the image synthesis process. Unlike conventional shape guidance methods that rely on depth maps restricted to fixed viewpoints and often overlook full 3D structure or textual context, ShapeWords generates diverse yet consistent images that reflect both the target shape's geometry and the textual description. Experimental results show that ShapeWords produces images that are more text-compliant, aesthetically plausible, while also maintaining 3D shape awareness.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Petrov_2025_CVPR, author = {Petrov, Dmitry and Goyal, Pradyumn and Shivashok, Divyansh and Tao, Yuanming and Averkiou, Melinos and Kalogerakis, Evangelos}, title = {ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {13305-13314} }