-
[pdf]
[supp]
[bibtex]@InProceedings{Jiang_2025_CVPR, author = {Jiang, Jiaxiu and Zhang, Yabo and Feng, Kailai and Wu, Xiaohe and Li, Wenbo and Pei, Renjing and Li, Fan and Zuo, Wangmeng}, title = {MC{\textasciicircum}2: Multi-concept Guidance for Customized Multi-concept Generation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {2802-2812} }
MC^2: Multi-concept Guidance for Customized Multi-concept Generation
Abstract
Customized text-to-image generation, which synthesizes images based on user-specified concepts, has made significant progress in handling individual concepts. However, when extended to multiple concepts, existing methods often struggle with properly integrating different models and avoiding the unintended blending of characteristics from distinct concepts. In this paper, we propose MC^2, a novel approach for multi-concept customization that enhances flexibility and fidelity through inference-time optimization. MC^2 enables the integration of multiple single-concept models with heterogeneous architectures. By adaptively refining attention weights between visual and textual tokens, our method ensures that image regions accurately correspond to their associated concepts while minimizing interference between concepts. Extensive experiments demonstrate that MC^2 outperforms training-based methods in terms of prompt-reference alignment. Furthermore, MC^2 can be seamlessly applied to text-to-image generation, providing robust compositional capabilities. To facilitate the evaluation of multi-concept customization, we also introduce a new benchmark, MC++. The code is available at https://github.com/JIANGJiaXiu/MC-2.
Related Material