Pruning as a Binarization Technique

Frickenstein, Lukas; Mori, Pierpaolo; Sampath, Shambhavi Balamuthu; Thoma, Moritz; Fasfous, Nael; Vemparala, Manoj Rohit; Frickenstein, Alexander; Unger, Christian; Passerone, Claudio; Stechele, Walter

Lukas Frickenstein, Pierpaolo Mori, Shambhavi Balamuthu Sampath, Moritz Thoma, Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, Christian Unger, Claudio Passerone, Walter Stechele; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2131-2140

Abstract

Convolutional neural networks (CNNs) can be quantized to reduce the bit-width of their weights and activations. Pruning is another compression technique where entire structures are removed from a CNN's computation graph. Multi-bit networks (MBNs) encode the operands (weights and activations) of the convolution into multiple binary bases where the bit-width of the particular operand is equal to its number of binary bases. Therefore this work views pruning an individual binary base in an MBN as a reduction in the bit-width of its operands i.e. quantization. Although many binarization methods have improved the accuracy of binary neural networks (BNNs) by e. g. minimizing quantization error improving training strategies or proposing different network architecture designs we reveal a new viewpoint to achieve high-accuracy BNNs which leverages pruning as a binarization technique (PaBT). We exploit gradient information that exposes the importance of each binary convolution and its contribution to the loss. We prune entire binary convolutions reducing the effective bit-widths of the MBN during the training. This ultimately results in a smooth convergence to accurate BNNs. PaBT achieves 2.9 p.p. 1.6 p.p. and 0.9 p.p. better accuracy than SotA BNNs IR-Net LNS and SiMaN on the ImageNet dataset respectively. Further PaBT scales to the more complex task of semantic segmentation outperforming ABC-Net on the CityScapes dataset. This positions PaBT as a novel high-accuracy binarization scheme and makes it the first to expose the potential of latent-weight-free training for compression techniques.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Frickenstein_2024_CVPR, author = {Frickenstein, Lukas and Mori, Pierpaolo and Sampath, Shambhavi Balamuthu and Thoma, Moritz and Fasfous, Nael and Vemparala, Manoj Rohit and Frickenstein, Alexander and Unger, Christian and Passerone, Claudio and Stechele, Walter}, title = {Pruning as a Binarization Technique}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2131-2140} }