Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting

Alexey Kravets, Da Chen, Vinay P. Namboodiri; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 1902-1911

Abstract


CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Kravets_2025_ICCV, author = {Kravets, Alexey and Chen, Da and Namboodiri, Vinay P.}, title = {Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {1902-1911} }