Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 7007-7016

Abstract


The field of image synthesis is currently flourishing due to the advancements in diffusion models. While diffusion models have been successful their computational intensity has prompted the pursuit of more efficient alternatives. As a representative work non-autoregressive Transformers (NATs) have been recognized for their rapid generation. However a major drawback of these models is their inferior performance compared to diffusion models. In this paper we aim to re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies. Specifically we identify the complexities in properly configuring these strategies and indicate the possible sub-optimality in existing heuristic-driven designs. Recognizing this we propose to go beyond existing methods by directly solving the optimal strategies in an automatic framework. The resulting method named AutoNAT advances the performance boundaries of NATs notably and is able to perform comparably with the latest diffusion models with a significantly reduced inference cost. The effectiveness of AutoNAT is comprehensively validated on four benchmark datasets i.e. ImageNet-256 & 512 MS-COCO and CC3M. Code and pre-trained models will be available at https://github.com/LeapLabTHU/ImprovedNAT.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Ni_2024_CVPR, author = {Ni, Zanlin and Wang, Yulin and Zhou, Renping and Guo, Jiayi and Hu, Jinyi and Liu, Zhiyuan and Song, Shiji and Yao, Yuan and Huang, Gao}, title = {Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {7007-7016} }