-
[pdf]
[supp]
[bibtex]@InProceedings{Qiu_2025_CVPR, author = {Qiu, Yuning and Wang, Andong and Li, Chao and Huang, Haonan and Zhou, Guoxu and Zhao, Qibin}, title = {STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {28640-28650} }
STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search
Abstract
Recent text-to-image (T2I) diffusion models have demonstrated remarkable capabilities in visual synthesis, yet their performance heavily relies on the quality of input prompts. However, optimizing discrete prompts remains challenging because the discrete nature of tokens prevents the direct application of the gradient descent method and the vast search space of possible token combinations. As a result, existing approaches either suffer from quantization errors when employing continuous optimization techniques or become trapped in local optima due to coordinate-wise greedy search. In this paper, we propose STEPS, a novel Sequential probability Tensor Estimation approach for hard Prompt Search. Our method reformulates discrete prompt optimization as a sequential probability tensor estimation problem, leveraging the inherent low-rank characteristics to address the curse of dimensionality. To further improve the computational efficiency, we develop a memory-bounded sampling approach that shrinks the prompt space without the iteration step dependency while preserving sequential optimization dynamics. Extensive experiments on various public datasets demonstrate that our method consistently outperforms existing approaches in T2I generation, cross-model prompt transferability, and harmful prompt optimization, validating the effectiveness of the proposed framework.
Related Material