Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704-2713

Abstract


The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based visual recognition models call for efficient on-device inference schemes. We propose a quantization scheme along with a co-designed training procedure allowing inference to be carried out using integer-only arithmetic while preserving an end-to-end model accuracy that is close to floating-point inference. Inference using integer-only arithmetic performs better than floating-point arithmetic on typical ARM CPUs and can be implemented on integer-arithmetic-only hardware such as mobile accelerators (e.g. Qualcomm Hexagon). By quantizing both activations and weights as 8-bit integers, we obtain a close to 4x memory footprint reduction compared to 32-bit floating-point representations. Even on MobileNets, a model family known for runtime efficiency, our quantization approach results in an improved tradeoff between latency and accuracy on popular ARM CPUs for ImageNet classification and COCO detection.

Related Material


[pdf] [supp] [arXiv] [video]
[bibtex]
@InProceedings{Jacob_2018_CVPR,
author = {Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry},
title = {Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}