FalCon: Fine-Grained Feature Map Sparsity Computing With Decomposed Convolutions for Inference Optimization
Many works focus on the model's static parameter optimization (e.g., filters and weights) for CNN inference acceleration. Compared to parameter sparsity, feature map sparsity is per-input related which has better adaptability. The practical sparsity patterns are non-structural and randomly located on feature maps with non-identical shapes. However, the existing feature map sparsity works take computing efficiency as the primary goal, thereby they can only remove structural sparsity and fail to match the above characteristics. In this paper, we develop a novel sparsity computing scheme called FalCon, which can well adapt to the practical sparsity patterns while still maintaining efficient computing. Specifically, we first propose a decomposed convolution design that enables a fine-grained computing unit for sparsity. Additionally, a decomposed convolution computing optimization paradigm is proposed to convert the sparse computing units to practical acceleration. Extensive experiments show that FalCon achieves at most 67.30% theoretical computation reduction with a neglected accuracy drop while accelerating CNN inference by 37%.