EA-Vit: Efficient Adaptation for Elastic Vision Transformer

Zhu, Chen; Zhao, Wangbo; Zhang, Huiwen; Zhou, Yuhao; Tang, Weidong; Wang, Shuo; Yuan, Zhihang; Shang, Yuzhang; Peng, Xiaojiang; Wang, Kai; Yang, Dawei

Chen Zhu, Wangbo Zhao, Huiwen Zhang, Yuhao Zhou, Weidong Tang, Shuo Wang, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Kai Wang, Dawei Yang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 1038-1047

Abstract

Vision Transformers (ViTs) have emerged as a foundational model in computer vision, excelling in generalization and adaptation to downstream tasks. However, deploying ViTs to support diverse resource constraints typically requires retraining multiple, size-specific ViTs, which is both time-consuming and energy-intensive. To address this issue, we propose an efficient ViT adaptation framework that enables a single adaptation process to generate multiple models of varying sizes for deployment on platforms with various resource constraints. Our approach comprises two stages. In the first stage, we enhance a pre-trained ViT with a nested elastic architecture that enables structural flexibility across MLP expansion ratio, number of attention heads, embedding dimension, and network depth. To preserve pre-trained knowledge and ensure stable adaptation, we adopt a curriculum-based training strategy that progressively increases elasticity. In the second stage, we design a lightweight router to select submodels according to computational budgets and downstream task demands. Initialized with Pareto-optimal configurations derived via a customized NSGA-II algorithm, the router is then jointly optimized with the backbone. Extensive experiments on multiple benchmarks demonstrate the effectiveness and versatility of EA-ViT. The code is available at https://github.com/zcxcf/EA-ViT.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zhu_2025_ICCV, author = {Zhu, Chen and Zhao, Wangbo and Zhang, Huiwen and Zhou, Yuhao and Tang, Weidong and Wang, Shuo and Yuan, Zhihang and Shang, Yuzhang and Peng, Xiaojiang and Wang, Kai and Yang, Dawei}, title = {EA-Vit: Efficient Adaptation for Elastic Vision Transformer}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {1038-1047} }