VLM-PTQ: Efficient Post-Training Quantization for Large Vision-Language Models

Juncan Deng, Kejie Huang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 24696-24705

Abstract


Post-training quantization (PTQ) serves as a vital technique for efficiently compressing large-scale models, with weight-compensation methods such as GPTQ (symmetric calibration) and GPTAQ (asymmetric calibration) showing remarkable success. However, directly applying these methods to Vision-Language Models (VLMs) exposes two notable shortcomings: 1) the standard rounding-to-nearest (RTN) method is suboptimal for the asymmetric objective, failing to account for residual-induced shifts in the optimal quantization target; and 2) all input channels are processed uniformly across modalities, overlooking the distinct information densities. In this paper, we introduce VLM-PTQ, an asymmetric post-training quantization framework for VLMs. First, we derive a closed-form correction term that shifts the quantization target, which explicitly accounts for the output residual and the corresponding inverse Hessian column, yielding a better local optimum than RTN. Second, we propose a modality-aware quantization that differentiates channel importance between vision and language tokens, allowing the quantizer to pre-compute better quantization parameters through a lightweight search. Our method extends weight-compensation methods with minimal overhead while achieving significant performance improvements in low-bit scenarios. Extensive experiments demonstrate that VLM-PTQ achieves competitive results compared to existing methods, effectively compressing models from 1B to 72B parameters on a single GPU.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Deng_2026_CVPR, author = {Deng, Juncan and Huang, Kejie}, title = {VLM-PTQ: Efficient Post-Training Quantization for Large Vision-Language Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {24696-24705} }