LCS: Learning Compressible Subspaces for Efficient, Adaptive, Real-Time Network Compression at Inference Time

Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3818-3827

Abstract


When deploying deep neural networks (DNNs) to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high, or available memory is low. In this work, we present a training procedure to produce DNNs that can be compressed in real-time to arbitrary compression levels entirely on-device. This enables the deployment of a single model that can efficiently adapt to its host device's available resources. We formulate this problem as learning an adaptively compressible network subspace, where one end is optimized for accuracy, and the other for efficiency. Our subspace model requires no recalibration nor retraining when changing compression levels. Moreover, our generic training framework is amenable to multiple forms of compression, and we present results for unstructured sparsity, structured sparsity, and quantization on a variety of architectures. We present models that require a single extra copy of network parameters, as well as models that require no extra parameters. Both models allow for operation at any compression level within a wide range (for example, 0% to 90% for structured sparsity with ResNet18 on ImageNet). At each compression level, our models achieve an accuracy comparable to a baseline model optimized for that particular compression level. To our knowledge, our method is the first to enable adaptive on-device network compression with little to no computational overhead.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Nunez_2023_WACV, author = {Nunez, Elvis and Horton, Maxwell and Prabhu, Anish and Ranjan, Anurag and Farhadi, Ali and Rastegari, Mohammad}, title = {LCS: Learning Compressible Subspaces for Efficient, Adaptive, Real-Time Network Compression at Inference Time}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {3818-3827} }