Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
Batch Normalization is a staple of computer vision models, including those employed in few-shot learning. Batch Normalization layers in convolutional neural networks are composed of a normalization step, followed by a shift and scale of these normalized features applied via the per-channel trainable affine parameters gamma and beta. These affine parameters were introduced to maintain the expressive powers of the model following normalization. While this hypothesis holds true for classification within the same domain, this work illustrates that these parameters are detrimental to downstream performance on common few-shot transfer tasks. This effect is studied with multiple methods on well-known benchmarks such as few-shot classification on miniImageNet and cross-domain few-shot learning (CD-FSL). Experiments reveal consistent performance improvements on CNNs with affine unaccompanied Batch Normalization layers; particularly in large domain-shift few-shot transfer settings. As opposed to common practices in few-shot transfer learning where the affine parameters are fixed during the adaptation phase, we show fine-tuning them can lead to improved performance.