-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Schaefer_2024_WACV, author = {Schaefer, Clemens JS and Joshi, Siddharth and Li, Shan and Blazquez, Raul}, title = {Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {8460-8469} }
Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks
Abstract
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference, facilitating the use of DNNs on edge computing platforms. Recent efforts at quantizing DNNs have employed a range of techniques encompassing progressive quantization, step-size adaptation, and gradient scaling. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing. Our method establishes a new pareto frontier in model accuracy and memory footprint demonstrating a range of pre-trained quantized models, delivering best-in-class accuracy below 4.3 MB of weights and activations without modifying the model architecture. Our main contributions are: (i) a method for tensor-sliced learned precision with a hardware-aware cost function for heterogeneous differentiable quantization, (ii) targeted gradient modification for weights and activations to mitigate quantization errors, and (iii) a multi-phase learning schedule to address instability in learning arising from updates to the learned quantizer and model parameters. We demonstrate the effectiveness of our techniques on the ImageNet dataset across a range of models including EfficientNet-Lite0 (e.g., 4.14MB of weights and activations at 67.66% accuracy) and MobileNetV2 (e.g., 3.51MB weights and activations at 65.39% accuracy).
Related Material