A General and Efficient Training for Transformer via Token Expansion

Huang, Wenxuan; Shen, Yunhang; Xie, Jiao; Zhang, Baochang; He, Gaoqi; Li, Ke; Sun, Xing; Lin, Shaohui

Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15783-15792

Abstract

The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost. Existing methods have attempted to accelerate the training of ViTs yet typically disregard method universality with accuracy dropping. Meanwhile they break the training consistency of the original transformers including the consistency of hyper-parameters architecture and strategy which prevents them from being widely applied to different Transformer networks. In this paper we propose a novel token growth scheme Token Expansion (termed ToE) to achieve consistent training acceleration for ViTs. We introduce an "initialization-expansion-merging" pipeline to maintain the integrity of the intermediate feature distribution of original transformers preventing the loss of crucial learnable information in the training process. ToE can not only be seamlessly integrated into the training and fine-tuning process of transformers (e.g. DeiT and LV-ViT) but also effective for efficient training frameworks (e.g. EfficientTrain) without twisting the original training hyper-parameters architecture and introducing additional training strategies. Extensive experiments demonstrate that ToE achieves about 1.3x faster for the training of ViTs in a lossless manner or even with performance gains over the full-token training baselines. Code is available at https://github.com/Osilly/TokenExpansion.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Huang_2024_CVPR, author = {Huang, Wenxuan and Shen, Yunhang and Xie, Jiao and Zhang, Baochang and He, Gaoqi and Li, Ke and Sun, Xing and Lin, Shaohui}, title = {A General and Efficient Training for Transformer via Token Expansion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {15783-15792} }