Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks

Schaefer, Clemens JS; Joshi, Siddharth; Li, Shan; Blazquez, Raul

Clemens JS Schaefer, Siddharth Joshi, Shan Li, Raul Blazquez; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 8460-8469

Abstract

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference, facilitating the use of DNNs on edge computing platforms. Recent efforts at quantizing DNNs have employed a range of techniques encompassing progressive quantization, step-size adaptation, and gradient scaling. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing. Our method establishes a new pareto frontier in model accuracy and memory footprint demonstrating a range of pre-trained quantized models, delivering best-in-class accuracy below 4.3 MB of weights and activations without modifying the model architecture. Our main contributions are: (i) a method for tensor-sliced learned precision with a hardware-aware cost function for heterogeneous differentiable quantization, (ii) targeted gradient modification for weights and activations to mitigate quantization errors, and (iii) a multi-phase learning schedule to address instability in learning arising from updates to the learned quantizer and model parameters. We demonstrate the effectiveness of our techniques on the ImageNet dataset across a range of models including EfficientNet-Lite0 (e.g., 4.14MB of weights and activations at 67.66% accuracy) and MobileNetV2 (e.g., 3.51MB weights and activations at 65.39% accuracy).

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Schaefer_2024_WACV, author = {Schaefer, Clemens JS and Joshi, Siddharth and Li, Shan and Blazquez, Raul}, title = {Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {8460-8469} }