SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization

Cao, Shijie; Ma, Lingxiao; Xiao, Wencong; Zhang, Chen; Liu, Yunxin; Zhang, Lintao; Nie, Lanshun; Yang, Zhi

Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11216-11225

Abstract

In this paper we present a novel and general method to accelerate convolutional neural network (CNN) inference by taking advantage of feature map sparsity. We experimentally demonstrate that a highly quantized version of the original network is sufficient in predicting the output sparsity accurately, and verify that leveraging such sparsity in inference incurs negligible accuracy drop compared with the original network. To accelerate inference, for each convolution layer our approach first obtains a binary sparsity mask of the output feature maps by running inference on a quantized version of the original network layer, and then conducts a full-precision sparse convolution to find out the precise values of the non-zero outputs. Compared with existing work, our approach avoids the overhead of training additional auxiliary networks, while is still applicable to general CNN networks without being limited to certain application domains.

Related Material

[pdf]

[bibtex]

@InProceedings{Cao_2019_CVPR,
author = {Cao, Shijie and Ma, Lingxiao and Xiao, Wencong and Zhang, Chen and Liu, Yunxin and Zhang, Lintao and Nie, Lanshun and Yang, Zhi},
title = {SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}