AMP-ViT: Optimizing Vision Transformer Efficiency with Adaptive Mixed-Precision Post-Training Quantization

Tai, Yu-Shan; Wu, An-Yeu

Yu-Shan Tai, An-Yeu Wu; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 6828-6837

Abstract

Vision transformers (ViTs) have revolutionized computer vision but face significant challenges due to their high computational and memory demands. Existing post-training quantization methods struggle to maintain performance at low bit-widths due to activation asymmetry and reliance on manual configurations. To overcome these challenges we introduce SymAlign to address activation asymmetry and reduce clamping loss. Additionally we propose AutoScale an automatic and data-driven mechanism that adapts to variant activations. We incorporate the above-mentioned techniques and propose an adaptive mixed-precision post-training quantization framework for vision transformers (AMP-ViT). Our comprehensive approach addresses asymmetry variant distribution and uneven sensitivities making it the first to tackle these challenges thoroughly. Our experiments on ViT DeiT and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset. Specifically our proposed methods achieve accuracy improvements ranging from 0.90% to 23.35% on 4-bit ViTs with single-precision and from 3.82% to 78.14% on 5-bit fully quantized ViTs with mixed-precision.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Tai_2025_WACV, author = {Tai, Yu-Shan and Wu, An-Yeu}, title = {AMP-ViT: Optimizing Vision Transformer Efficiency with Adaptive Mixed-Precision Post-Training Quantization}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {6828-6837} }