Fixing Quantization with Lightweight Adapters

Mohammadi, Mohammadreza; Grenier, Matthew; Zand, Ramtin

Mohammadreza Mohammadi, Matthew Grenier, Ramtin Zand; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 3569-3578

Abstract

Quantization is an effective strategy for compressing deep neural networks by reducing numerical precision. However, post-training quantization (PTQ) often suffers from significant accuracy degradation at low bit-widths, while quantization-aware training (QAT) is computationally expensive and requires full retraining. We propose a lightweight, sublayer-aware compensation framework that inserts small low-rank adapters into transformer blocks to correct for quantization errors. The adapters are optimized using a hybrid objective combining supervised learning, knowledge distillation, and feature reconstruction to align the quantized model with its full-precision counterpart. Our approach requires only minimal tuning on a small subset of training data and introduces less than 1% overhead for large models and less than 5% for tiny models, while effectively recovering accuracy lost to low-bit quantization. Extensive experiments across vision and language models, including ViT, Swin, BERT, and GPT-2, demonstrate state-of-the-art performance under aggressive precision settings, and in some cases even surpass the original FP32 accuracy. These results provide a practical pathway for deploying highly quantized models with near- or above-full-precision performance.

Related Material

[pdf]

[bibtex]

@InProceedings{Mohammadi_2026_CVPR, author = {Mohammadi, Mohammadreza and Grenier, Matthew and Zand, Ramtin}, title = {Fixing Quantization with Lightweight Adapters}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2026}, pages = {3569-3578} }