Constrained Weight Optimization for Learning Without Activation Normalization
Weight Normalization (WN) is an essential building block in deep learning. However, even state-of-the-art WN methods need to be combined with activation normalization methods, such as Batch Normalization (BN), to provide the same classification accuracy as BN. In this paper, we aim to circumvent this issue with a weight normalization approach that can be used on its own to provide a classification accuracy competitive to BN. Our approach mimics three fundamental properties of BN, namely, keeping the norm of the weights constant, setting the mean of the weights to zero, and simulating stochastic perturbations due to batch sampling bias. Unlike most of the existing WN methods that rely on "reparametrization", our method directly optimizes the weights with proper constraints and thus can circumvent its serious drawback, gradient explosion. Moreover, we propose an efficient and easy-to-implement algorithm to solve our constrained optimization problem without sacrificing its benefits. The results of classification experiments on three popular benchmark datasets demonstrate that our method is highly competitive with or even better than the state-of-the-art normalization methods.