-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Erdogan_2025_ICCV, author = {Erdogan, Goker and Parthasarathy, Nikhil and Ionescu, Catalin and Hudson, Drew A. and Lerchner, Alexander and Zisserman, Andrew and Sajjadi, Mehdi S. M. and Carreira, Joao}, title = {LayerLock: Non-collapsing Representation Learning with Progressive Freezing}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {19461-19470} }
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Abstract
We introduce LayerLock, a simple yet effective approach for self-supervised visual representation learning, that gradually transitions throughout training from predicting shallow features to deeper ones through progressive layer freezing. First, we make the observation that during training of video masked-autoencoding (MAE) models, ViT layers converge in the order of their depth: shallower layers converge early, deeper layers converge late. We then show that this observation can be exploited to accelerate standard MAE by progressively freezing the model according to an explicit schedule, throughout training. Furthermore, this same schedule can be used in a simple and scalable approach to latent prediction that does not suffer from "representation collapse". We apply our proposed approach, LayerLock, to both pixel and latent prediction approaches with large models of up to 4B parameters and show improvements on both semantic (action classification) and low level (depth estimation) vision tasks.
Related Material