PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers

Maximilian Augustin, Syed Shakib Sarwar, Mostafa Elhoushi, Yuecheng Li, Sai Qian Zhang, Barbara De Salvo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 1884-1894

Abstract


Transformers have revolutionized natural language processing (NLP) and are increasingly influential in computer vision tasks. Despite their strong performance and multi-tasking capabilities, transformers' high computational demands limit their applicability in resource-constrained environments, where convolutional or hybrid models (combining convolution and attention layers) often excel, particularly in the sub-100M parameter range. While parameter-efficient task adaptation techniques have been successful in NLP, they have not been widely adopted for hybrid transformers in vision tasks. In this work, we introduce PETAH (Parameter Efficient Task Adaptation for Hybrid Transformers), a novel framework for efficiently adapting hybrid transformers to new tasks. We further combine PETAH with pruning to create high-performing and storage-efficient models suitable for multi-tasking. Our extensive evaluations on classification and other vision tasks demonstrate that PETAH-adapted hybrid models outperform established task-adaptation techniques for Vision Transformers (ViTs), requiring fewer parameters and achieving greater efficiency on mobile hardware.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Augustin_2025_CVPR, author = {Augustin, Maximilian and Sarwar, Syed Shakib and Elhoushi, Mostafa and Li, Yuecheng and Zhang, Sai Qian and De Salvo, Barbara}, title = {PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {1884-1894} }