Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images

Jun Chen, Faizan Farooq Khan, Ming Hu, Ammar Sherif, Zongyuan Ge, Boyang Li, Mohamed Elhoseiny; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 8035-8045

Abstract


Self-supervised learning for computer vision has progressed tremendously and improved many downstream vision tasks such as image classification semantic segmentation and object detection. Among these generative self-supervised vision learning approaches such as MAE and BEiT show promising performance. However their global reconstruction mechanism is computationally demanding especially for high-resolution images. The computational cost increases extensively when scaled to a large-scale dataset. To address this issue we propose local masked reconstruction (LoMaR) a simple yet effective approach that reconstructs image patches from small neighboring regions. The strategy can be easily integrated into any generative self-supervised learning techniques and improves the trade-off between efficiency and accuracy compared to reconstruction over the entire image. LoMaR is 2.5x faster than MAE and 5.0x faster than BEiT on 384x384 ImageNet pretraining and surpasses them by 0.2% and 0.8% in accuracy respectively. It is 2.1x faster than MAE on iNaturalist pretraining and gains 0.2% in accuracy. On MS COCO LoMaR outperforms MAE by 0.5 APbox on object detection and 0.5 APmask on instance segmentation. It also outperforms MAE by 0.2% on semantic segmentation. Our code and pretrained models are available at: https://github.com/junchen14/LoMaR.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Chen_2025_WACV, author = {Chen, Jun and Khan, Faizan Farooq and Hu, Ming and Sherif, Ammar and Ge, Zongyuan and Li, Boyang and Elhoseiny, Mohamed}, title = {Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8035-8045} }