Mini but Mighty: Finetuning ViTs With Mini Adapters

Imad Eddine Marouf, Enzo Tartaglione, Stéphane Lathuilière; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 1732-1741

Abstract


Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of fine-tuning. In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce the size of every adapter. We introduce a scoring function that compares neuron importance across layers and consequently allows automatic estimation of the hidden dimension of every adapter. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets. We will release our code publicly upon acceptance.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Marouf_2024_WACV, author = {Marouf, Imad Eddine and Tartaglione, Enzo and Lathuili\`ere, St\'ephane}, title = {Mini but Mighty: Finetuning ViTs With Mini Adapters}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {1732-1741} }