Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 23219-23230

Abstract


Continual learning can empower vision-language models to continuously acquire new knowledge without the need for access to the entire historical dataset. However mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout lifelong learning and (ii) significant computational burdens associated with full-model tuning. In this work we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-language models we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP respectively. Through extensive experiments across various settings our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yu_2024_CVPR, author = {Yu, Jiazuo and Zhuge, Yunzhi and Zhang, Lu and Hu, Ping and Wang, Dong and Lu, Huchuan and He, You}, title = {Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {23219-23230} }