Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

Chen, Yu-Hui; Sarokin, Raman; Lee, Juhyun; Tang, Jiuqiang; Chang, Chuo-Ling; Kulik, Andrei; Grundmann, Matthias

Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 4651-4655

Abstract

The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without INT8 quantization for a 512 x 512 image with 20 iterations) on GPU equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Chen_2023_CVPR, author = {Chen, Yu-Hui and Sarokin, Raman and Lee, Juhyun and Tang, Jiuqiang and Chang, Chuo-Ling and Kulik, Andrei and Grundmann, Matthias}, title = {Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {4651-4655} }