-
[pdf]
[supp]
[bibtex]@InProceedings{Yang_2024_CVPR, author = {Yang, Yuedong and Chiang, Hung-Yueh and Li, Guihong and Marculescu, Diana and Marculescu, Radu}, title = {Cache and Reuse: Rethinking the Efficiency of On-device Transfer Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {8040-8049} }
Cache and Reuse: Rethinking the Efficiency of On-device Transfer Learning
Abstract
Training only the last few layers in deep neural networks has been considered an effective strategy for enhancing the efficiency of on-device training. Prior work has adopted this approach and focused on accelerating backpropagation. However by conducting a thorough system-wide analysis we discover that the primary bottleneck is actually the forward propagation through the frozen layers rather than backpropagation if only the last few layers are trained. To address this issue we introduce the "cache and reuse" idea for on-device transfer learning and propose a two-stage training method which consists of a cache initialization stage where we store the output from the frozen layers followed by a training stage. To make our approach practical we also propose augmented feature caching and cache compression to address the challenges of non-cacheable feature maps and cache size explosion. We carry out extensive experiments on various models (e.g. convolutional neural network and vision transformers) using real edge devices to demonstrate the effectiveness of our method. As an example on NVIDIA Jetson Orin NX with MobileNet-V2 our approach boosts the training speed by 6.6x and improves the accuracy by 2.1%. For EfficientNet-b0 our method increases the training speed by 2.2x and improves its accuracy by 1.3%. Therefore our approach represents a significant improvement in enabling practical on-device transfer learning for edge devices with limited resources.
Related Material