Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Dongwon Kim, Ju He, Qihang Yu, Chenglin Yang, Xiaohui Shen, Suha Kwak, Liang-Chieh Chen; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 18442-18452

Abstract


Image tokenizers form the foundation of modern text-toimage generative models but are notoriously difficult to train. Furthermore, most existing text-to-image models rely on large-scale, high-quality private datasets, making them challenging to replicate. In this work, we introduce **T**ext-**A**ware **T**ransformer-based 1-D**i**mensional **Tok**enizer (TA-TiTok), an efficient and powerful image tokenizer that can utilize either discrete or continuous 1-dimensional tokens. TA-TiTok uniquely integrates textual information during the tokenizer decoding stage (i.e., de-tokenization), accelerating convergence and enhancing performance. TA-TiTok also benefits from a simplified, yet effective, one-stage training process, eliminating the need for the complex two-stage distillation used in previous 1-dimensional tokenizers. This design allows for seamless scalability to large datasets. Building on this, we introduce a family of text-to-image **Mask**ed **Gen**erative Models (MaskGen), trained exclusively on open data while achieving comparable performance to models trained on private data. We aim to release both the efficient, strong TA-TiTok tokenizers and the open-data, open-weight MaskGen models to promote broader access and democratize the field of text-to-image masked generative models.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Kim_2025_ICCV, author = {Kim, Dongwon and He, Ju and Yu, Qihang and Yang, Chenglin and Shen, Xiaohui and Kwak, Suha and Chen, Liang-Chieh}, title = {Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {18442-18452} }