AIpparel: A Multimodal Foundation Model for Digital Garments

Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan, Yang Zheng, Maria Korosteleva, Olga Sorkine-Hornung, Leonidas J. Guibas, Guandao Yang, Gordon Wetzstein; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 8138-8149

Abstract


Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-art large multimodal models (LMMs) on a custom-curated large-scale dataset of over 120,000 unique garments, each with multimodal annotations including text, images, and sewing patterns. Additionally, we propose a novel tokenization scheme that concisely encodes these complex sewing patterns so that LLMs can learn to predict them efficiently. AIpparel achieves state-of-the-art performance in single-modal tasks, including text-to-garment and image-to-garment prediction, and enables novel multimodal garment generation applications such as interactive garment editing. The project website is at https://georgenakayama.github.io/AIpparel/.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Nakayama_2025_CVPR, author = {Nakayama, Kiyohiro and Ackermann, Jan and Kesdogan, Timur Levent and Zheng, Yang and Korosteleva, Maria and Sorkine-Hornung, Olga and Guibas, Leonidas J. and Yang, Guandao and Wetzstein, Gordon}, title = {AIpparel: A Multimodal Foundation Model for Digital Garments}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {8138-8149} }