-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Lee_2023_WACV, author = {Lee, Kuan-Ying and Zhong, Yuanyi and Wang, Yu-Xiong}, title = {Do Pre-Trained Models Benefit Equally in Continual Learning?}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {6485-6493} }
Do Pre-Trained Models Benefit Equally in Continual Learning?
Abstract
A large part of the continual learning (CL) literature focuses on developing algorithms for models trained from scratch. While these algorithms work great with from-sc ratch trained models on widely used CL benchmarks, they show dramatic performance drops on more complex datasets (e.g., Split-CUB200). Pre-trained models, widely used to transfer knowledge to downstream tasks, could enhance these methods to be applicable in more realistic scenarios. However, surprisingly, improvements in CL algorithms from pre-training are inconsistent. For instance, while Incremental Classifier and Representation Learning (iCaRL) underperforms Supervised Contrastive Replay (SCR) when trained from scratch, it outperforms SCR when both are initialized with a pre-trained model. This indicates the paradigm current CL literature follows, where all methods are compared in from-scratch training, is not well reflective of the true CL objective and desired progress. Furthermore, we found 1) CL algorithms that exert less regularization benefit more from a pre-trained model; 2) a model pre-trained with a larger dataset (WebImageText in Contrastive Language-Image Pre-training (CLIP) vs. ImageNet) does not guarantee a better improvement. Based on these findings, we introduced a simple yet effective baseline that employs minimum regularization and leverages the more beneficial pre-trained model, which outperforms state-of-the-art methods when pre-training is applied. Our code is available at https://github.com/eric11220/pretrained-models-in-CL.
Related Material