-
[pdf]
[bibtex]@InProceedings{Ghosh_2025_ICCV, author = {Ghosh, Tapotosh and Al Nahian, Md Jaber and Sheikhi, Farnaz and Maleki, Farhad}, title = {Decoder-aware Self-Supervised Continual Pretraining and Uncertainty-Guided Pseudo-Labeling for Wheat Organ Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {7197-7207} }
Decoder-aware Self-Supervised Continual Pretraining and Uncertainty-Guided Pseudo-Labeling for Wheat Organ Segmentation
Abstract
Accurate segmentation of wheat organs---spikes, leaves, and stems---is critical for plant phenotyping but remains a challenge due to domain variability and limited labeled data. In this work, we propose a label-efficient segmentation framework that utilizes only 99 labeled samples from the GWFSS dataset. Our approach begins with a continual decoder-aware pretraining (DeCon-ML), where a ConvNeXt-L encoder (initialized with ImageNet-1K weights) is paired with a Feature Pyramid Network (FPN) decoder and pretrained on 64,368 unlabeled wheat images in two stages: training the decoder while keeping the encoder frozen, followed by joint encoder-decoder pretraining. For fine-tuning, we train ConvNeXt-L models on the labeled set with 99 samples, initialized with ImageNet-1K and GWFSS-pretrained weights, and ensemble them to leverage complementary representations. We further enhance generalization by incorporating pseudo-labels, chosen based on feature-space similarity to the labeled set and filtering out low-confidence predictions through uncertainty estimation. We also incorporate a BEiT-L model trained only on the training set and ensembled with ConvNext-L models to achieve our best results. Our proposed approach achieves a mean Intersection over Union (mIoU) of 73.61 and 67.88, respectively, on the validation and test set of the GWFSS dataset, effectively segmenting all four classes (spikes, leaves, stems, background) under varying conditions. This study demonstrates how combining continual pretraining, similarity-aware candidate selection and uncertainty-guided pseudo-labeling can significantly improve semantic segmentation with minimal supervision in agricultural vision. Code is available at https://github.com/tapu1996/DeCon-UGPL-GWFSS.
Related Material
