-
[pdf]
[supp]
[bibtex]@InProceedings{Wu_2025_CVPR, author = {Wu, Zhuguanyu and Wang, Shihe and Zhang, Jiayi and Chen, Jiaxin and Wang, Yunhong}, title = {FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {14891-14900} }
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Abstract
Post-training quantization (PTQ) has stood out as a cost-effective and promising model compression approach over recent years, as it eliminates the need for retraining on the entire dataset. Unfortunately, most existing PTQ methods for Vision Transformers (ViTs) exhibit a notable drop in accuracy, especially in low-bit cases. To tackle these challenges, we analyze the extensively utilized Hessian-guided quantization loss, and uncover certain limitations within the approximated pre-activation Hessian. Following the block-by-block reconstruction paradigm of PTQ, we first derive a quantization loss based on the Fisher Information Matrix (FIM). Due to the large scale of the complete FIM, we establish the relationship between KL divergence and FIM in the PTQ scenario to enable fast computation of the quantization loss during reconstruction. Subsequently, we develop a Diagonal Plus Low-Rank (DPLR) estimation on FIM to achieve a more nuanced quantization loss. Our extensive experiments, conducted across various vision tasks with distinct representative ViT-based architectures on public benchmark datasets, demonstrate that our method outperforms the state-of-the-art approaches, especially in the case of low-bit quantization. The source code is available at https://github.com/ShiheWang/FIMA-Q.
Related Material