Initialization and Transfer Learning of Stochastic Binary Networks From Real-Valued Ones
We consider the training of binary neural networks (BNNs) using the stochastic relaxation approach, which leads to stochastic binary networks (SBNs). We identify that a severe obstacle to training deep SBNs without skip connections is already the initialization phase. While smaller models can be trained from a random (possibly data-driven) initialization, for deeper models and large datasets, it becomes increasingly difficult to obtain non-vanishing and low variance gradients when initializing randomly. In this work, we initialize SBNs from real-valued networks with ReLU activations. Real valued networks are well established, easier to train and benefit from many techniques to improve their generalization properties. We propose that closely approximating their internal features can provide a good initialization for SBN. We transfer features incrementally, layer-by-layer, accounting for noises in the SBN, exploiting equivalent reparametrizations of ReLU networks and using a novel transfer loss formulation. We demonstrate experimentally that with the proposed initialization, binary networks can be trained faster and achieve a higher accuracy than when initialized randomly.