Hyperblock Floating Point: Generalised Quantization Scheme for Gradient and Inference Computation

Nascimento, Marcelo Gennari do; Prisacariu, Victor Adrian; Fawcett, Roger; Langhammer, Martin

Marcelo Gennari do Nascimento, Victor Adrian Prisacariu, Roger Fawcett, Martin Langhammer; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 6364-6373

Abstract

Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CIFAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Nascimento_2023_WACV, author = {Nascimento, Marcelo Gennari do and Prisacariu, Victor Adrian and Fawcett, Roger and Langhammer, Martin}, title = {Hyperblock Floating Point: Generalised Quantization Scheme for Gradient and Inference Computation}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {6364-6373} }