Split Adaptation for Pre-trained Vision Transformers

Lixu Wang, Bingqi Shang, Yi Li, Payal Mohapatra, Wei Dong, Xiao Wang, Qi Zhu; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 20092-20102

Abstract


Vision Transformers (ViTs), extensively pre-trained on large-scale datasets, have become fundamental to foundation models, enabling adaptation to diverse downstream tasks. Existing adaptation methods typically require direct data access, rendering them infeasible in privacy-sensitive domains where clients are often reluctant to share their data. A straightforward solution may be sending the pre-trained ViT to clients for local adaptation, which poses issues of model intellectual property and incurs heavy client computation overhead. To address these issues, we propose a novel split adaptation (SA) method that enables effective downstream adaptation while protecting data and models. SA, inspired by split learning (SL), segments the pre-trained ViT into a frontend and a backend, with only the frontend shared with the client for data representation extraction. But unlike regular SL, SA replaces frontend parameters with low-bit quantized values, preventing direct exposure of the model. SA allows the client to add bi-level noise to the frontend and the extracted data representations, ensuring data protection. Accordingly, SA incorporates data-level and model-level out-of-distribution enhancements to mitigate noise injection's impact. Our SA focuses on the challenging few-shot adaptation and adopts patch retrieval augmentation for overfitting alleviation. Extensive experiments on multiple datasets validate SA's superiority over state-of-the-art methods and demonstrate its defense against advanced data reconstruction attacks while preventing model leakage with minimal computation cost on the client side. The source codes can be found at https://github.com/conditionWang/Split_Adaptation.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2025_CVPR, author = {Wang, Lixu and Shang, Bingqi and Li, Yi and Mohapatra, Payal and Dong, Wei and Wang, Xiao and Zhu, Qi}, title = {Split Adaptation for Pre-trained Vision Transformers}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {20092-20102} }