Modality-Aware Bit Allocation for Mixed-Precision Quantization of Vision-Language Models

Xi Zhang, Hanwei Zhu, Jiamang Wang, Xiaolin Wu, Weisi Lin; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 9305-9315

Abstract


We present Modality-Aware Bit Allocation (MABA), a mixed-precision post-training quantization framework for vision-language models (VLMs). MABA builds upon existing modality-aware quantization strategies and introduces an explicit bit allocation stage that distributes precision across model groups based on gradient-guided sensitivity estimates. By formulating bit assignment as a constrained optimization balancing reconstruction error and hardware budgets, MABA enables fine-grained trade-offs between accuracy and memory or latency.Applied to state-of-the-art VLMs such as LLaVA, Qwen-VL, and InternVL, MABA improves multi-modal understanding performance by up to 2.3% over strong uniform-bit baselines at equal or lower bit-width budgets. Our results highlight the effectiveness of structured bit allocation in enhancing the efficiency and adaptability of large-scale multi-modal models.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2026_CVPR, author = {Zhang, Xi and Zhu, Hanwei and Wang, Jiamang and Wu, Xiaolin and Lin, Weisi}, title = {Modality-Aware Bit Allocation for Mixed-Precision Quantization of Vision-Language Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, month = {June}, year = {2026}, pages = {9305-9315} }