-
[pdf]
[bibtex]@InProceedings{Al_Nahian_2025_CVPR, author = {Al Nahian, Md Jaber and Ghosh, Tapotosh and Sheikhi, Farnaz and Maleki, Farhad}, title = {Agri-FM+: A Self-Supervised Foundation Model for Agricultural Vision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {5520-5532} }
Agri-FM+: A Self-Supervised Foundation Model for Agricultural Vision
Abstract
Foundation models have revolutionized computer vision, yet their adoption in precision agriculture remains limited due to significant domain shifts from natural images. Existing agricultural foundation models focus primarily on remote sensing applications; to date, no dedicated foundation model exists for close-field agricultural vision. In this paper, we propose Agri-FM+, a self-supervised foundation model specifically tailored for agricultural vision, trained via a two-stage continual learning pipeline. Starting from publicly available unsupervised ImageNet weights from the SlotCon, Agri-FM+ is continually adapted on a curated 147K-image agricultural dataset using SlotCon. Evaluated across eight diverse benchmarks---covering object detection, semantic segmentation, and instance segmentation tasks---Agri-FM+ consistently outperforms both ImageNet-pretrained and randomly initialized models. Under full supervision, it achieves average gains of +1.27% over supervised ImageNet-pretrained and +8.25% over random initialization. Even when trained with only 10% of the annotated data, Agri-FM+ maintains robust performance, achieving gains of +1.02% and +4.54% over supervised ImageNet pretraining and random initialization, respectively. These results demonstrate the ability of Agri-FM+ to provide domain-adapted, label-efficient representations that scale effectively across real-world agricultural vision tasks. The code, weights, and more details will be made available at: https://github.com/FarhadMaleki/AgriFMPlus.
Related Material

