Learning Sparse Neural Networks Through Mixture-Distributed Regularization

Chang-Ti Huang, Jun-Cheng Chen, Ja-Ling Wu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 694-695


L0-norm regularization is one of the most efficient approaches to learn a sparse neural network. Due to its discrete nature, differentiable and approximate regularizations based on the concrete distribution or its variants are proposed as alternatives; however, the concrete relaxation suffers from high-variance gradient estimates and is limited to its own concrete distribution. To address these issues, in this paper, we propose a more general framework for relaxing binary gates through mixture distributions. With the proposed method, any mixture pair of distributions converging to d(0) and d(1) can be applied to construct smoothed binary gates. We further introduce a reparameterization method for the smoothed binary gates drawn from mixture distributions to enable efficient gradient gradient-based optimization under the proposed deep learning algorithm. Extensive experiments are conducted, and the results show that the proposed approach achieves better performance in terms of pruned architectures, structured sparsity and the reduced number of floating point operations (FLOPs) as compared with other state-of-the-art sparsity-inducing methods.

Related Material

[pdf] [supp]
author = {Huang, Chang-Ti and Chen, Jun-Cheng and Wu, Ja-Ling},
title = {Learning Sparse Neural Networks Through Mixture-Distributed Regularization},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}