Vision Transformer Adapters for Generalizable Multitask Learning

Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19015-19026

Abstract


We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added. We introduce a task-adapted attention mechanism within our adapter framework that combines gradient-based task similarities with attention-based ones. The learned task affinities generalize to the following settings: zero-shot task transfer, unsupervised domain adaptation, and generalization without fine-tuning to novel domains. We demonstrate that our approach outperforms not only the existing convolutional neural network-based multitasking methods but also the vision transformer-based ones. Our project page is at https://ivrl.github.io/VTAGML.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Bhattacharjee_2023_ICCV, author = {Bhattacharjee, Deblina and S\"usstrunk, Sabine and Salzmann, Mathieu}, title = {Vision Transformer Adapters for Generalizable Multitask Learning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {19015-19026} }