DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning

Shuai Shao, Yu Bai, Yan Wang, Baodi Liu, Yicong Zhou; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 28505-28514

Abstract


Open-World Few-Shot Learning (OFSL) is a critical field of research concentrating on the precise identification of target samples in environments with scarce data and unreliable labels thus possessing substantial practical significance. Recently the evolution of foundation models like CLIP has revealed their strong capacity for representation even in settings with restricted resources and data. This development has led to a significant shift in focus transitioning from the traditional method of "building models from scratch" to a strategy centered on "efficiently utilizing the capabilities of foundation models to extract relevant prior knowledge tailored for OFSL and apply it judiciously". Amidst this backdrop we unveil the Direct-and-Inverse CLIP (DeIL) an innovative method leveraging our proposed "Direct-and-Inverse" concept to activate CLIP-based methods for addressing OFSL. This concept transforms conventional single-step classification into a nuanced two-stage process: initially filtering out less probable categories followed by accurately determining the specific category of samples. DeIL comprises two key components: a pre-trainer (frozen) for data denoising and an adapter (tunable) for achieving precise final classification. In experiments DeIL achieves SOTA performance on 11 datasets.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Shao_2024_CVPR, author = {Shao, Shuai and Bai, Yu and Wang, Yan and Liu, Baodi and Zhou, Yicong}, title = {DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {28505-28514} }