Towards Calibrating Prompt Tuning of Vision- Language Models

Sharifdeen, Ashshak; Shamshad, Fahad; Munir, Muhammad Akhtar; Basu, Abhishek; Ismithdeen, Mohamed; Jeyamohan, Jeyapriyan; Silva, Chathurika; Nandakumar, Karthik; Khan, Muhammad Haris

Ashshak Sharifdeen, Fahad Shamshad, Muhammad Akhtar Munir, Abhishek Basu, Mohamed Ismithdeen, Jeyapriyan Jeyamohan, Chathurika Silva, Karthik Nandakumar, Muhammad Haris Khan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 39131-39140

Abstract

Prompt tuning of large-scale vision-language models such as CLIP enables efficienttask adaptation without updating model weights. However, it often leads to poorconfidence calibration and unreliable predictive uncertainty. We address thisproblem by proposing a calibration framework that enhances predictive reliabilitywhile preserving the geometry of the pretrained CLIP embedding space, which isrequired for robust generalization. Our approach extends the standard cross-entropyloss with two complementary regularizers: (1) a mean-variance margin penalty thatstabilizes inter-class logit margins by maximizing their average while minimizingdispersion, mitigating underconfidence and overconfidence spikes; and (2) a textmoment-matching loss that aligns the first and second moments of tuned textembeddings with their frozen CLIP counterparts, preserving semantic dispersioncrucial for generalization. Through extensive experiments across 7 prompt-tuningmethods and 11 diverse datasets, we demonstrate that our approach significantlyreduces the Expected Calibration Error (ECE) compared to competitive calibrationtechniques on both base and novel classes.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Sharifdeen_2026_CVPR, author = {Sharifdeen, Ashshak and Shamshad, Fahad and Munir, Muhammad Akhtar and Basu, Abhishek and Ismithdeen, Mohamed and Jeyamohan, Jeyapriyan and Silva, Chathurika and Nandakumar, Karthik and Khan, Muhammad Haris}, title = {Towards Calibrating Prompt Tuning of Vision- Language Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {39131-39140} }