Few-shot Learner Parameterization by Diffusion Time-steps

Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 23263-23272

Abstract


Even when using large multi-modal foundation models few-shot learning is still challenging -- if there is no proper inductive bias it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes i.e. as the forward diffusion adds noise to an image at each time-step nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this we propose Time-step Few-shot (TiF) learner. We train class-specific low-rank adapters for a text-conditioned DM to make up for the lost attributes such that images can be accurately reconstructed from their noisy ones given a prompt. Hence at a small time-step the adapter and prompt are essentially a parameterization of only the nuanced class attributes. For a test image we can use the parameterization to only extract the nuanced class attributes for classification. TiF learner significantly outperforms OpenCLIP and its adapters on a variety of fine-grained and customized few-shot learning tasks. Codes are in https://github.com/yue-zhongqi/tif.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yue_2024_CVPR, author = {Yue, Zhongqi and Zhou, Pan and Hong, Richang and Zhang, Hanwang and Sun, Qianru}, title = {Few-shot Learner Parameterization by Diffusion Time-steps}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {23263-23272} }