-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{He_2024_CVPR, author = {He, Haoyu and Pan, Zizheng and Liu, Jing and Cai, Jianfei and Zhuang, Bohan}, title = {Efficient Stitchable Task Adaptation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {28555-28565} }
Efficient Stitchable Task Adaptation
Abstract
The paradigm of pre-training and fine-tuning has laid the foundation for deploying deep learning models. However most fine-tuning methods are designed to meet a specific resource budget. Recently considering diverse deployment scenarios with various resource budgets SN-Net is introduced to quickly obtain numerous new networks (stitches) from the pre-trained models (anchors) in a model family via model stitching. Although promising SN-Net confronts new challenges when adapting it to new target domains including huge memory and storage requirements and a long and sub-optimal multistage adaptation process. In this work we present a novel framework Efficient Stitchable Task Adaptation (ESTA) to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints. Specifically we first tailor parameter-efficient fine-tuning to share low-rank updates among the stitches while maintaining independent bias terms. In this way we largely reduce fine-tuning memory burdens and mitigate the interference among stitches that arises in task adaptation. Furthermore we streamline a simple yet effective one-stage deployment pipeline which estimates the important stitches to deploy with training-time gradient statistics. By assigning higher sampling probabilities to important stitches we also get a boosted Pareto frontier. Extensive experiments on 25 downstream visual recognition tasks demonstrate that our ESTA is capable of generating stitches with smooth accuracy-efficiency trade-offs and surpasses the direct SN-Net adaptation by remarkable margins with significantly lower training time and fewer trainable parameters. Furthermore we demonstrate the flexibility and scalability of our ESTA framework by stitching LLMs from LLaMA family obtaining chatbot stitches of assorted sizes.
Related Material