Neural Redshift: Random Networks are not Random Functions

Damien Teney, Armand Mihai Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 4786-4796

Abstract


Our understanding of the generalization capabilities of neural networks NNs is still incomplete. Prevailing explanations are based on implicit biases of gradient descent GD but they cannot account for the capabilities of models from gradientfree methods nor the simplicity bias recently observed in untrained networks This paper seeks other sources of generalization in NNs. To understand the inductive biases provided by architectures independently from GD we examine untrained randomweight networks Even simple MLPs show strong inductive biases uniform sampling in weight space yields a very biased distribution of functions in terms of complexity But unlike common wisdom NNs do not have an inherent simplicity bias This property depends on components such as ReLUs residual connections and layer normalizations Alternative architectures can be built with a bias for any level of complexity. Transformers also inherit all these properties from their building blocks. We provide a fresh explanation for the success of deep learning independent from gradientbased training It points at promising avenues for controlling the solutions implemented by trained models.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Teney_2024_CVPR, author = {Teney, Damien and Nicolicioiu, Armand Mihai and Hartmann, Valentin and Abbasnejad, Ehsan}, title = {Neural Redshift: Random Networks are not Random Functions}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {4786-4796} }