-
[pdf]
[supp]
[bibtex]@InProceedings{Chatterjee_2025_ICCV, author = {Chatterjee, Abhiroop and Ghosh, Susmita and Ghosh, Ashish}, title = {Context-Aware Masking and Learnable Diffusion-Guided Patch Refinement in Transformers via Sparse Supervision for Hyperspectral Image Classification}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2927-2936} }
Context-Aware Masking and Learnable Diffusion-Guided Patch Refinement in Transformers via Sparse Supervision for Hyperspectral Image Classification
Abstract
Vision transformers often struggle with sensitivity to spectral-spatial perturbations and inefficiencies in label-scarce regimes for hyperspectral image analysis. To address these, we introduce contextually perturbed diffusion-guided active learning (CPDGAL), integrating diffusion-guided feature refinement (DGFR) and contextualized masking (CM). The DGFR initially injects structured perturbations into patch embeddings and then reconstructs clean patches via a diffusion-based denoising mechanism. Through this, DGFR refines the learned features while implicitly calibrating the aleatoric uncertainty. The CM mechanism applies attention-guided probabilistic masking and enforces context-aware reconstruction to improve generalization. We also introduce a sparse supervision scheme for label-scarce scenarios that selects uncertain samples using confidence-aware ranking, prioritizing challenging data for efficient retraining through active learning (AL). Experiments on benchmark datasets validate the effectiveness of CPDGAL, achieving 97.34% overall accuracy (OA) on Indian Pines, 99.87% on Salinas, and 98.94% on Botswana with a lightweight architecture (0.09M parameters, 5.17 MFLOPs) and outperforms sixteen CNN/transformer-based SOTA methods. Our framework also generalizes better than the vision transformer in extreme low-label settings.
Related Material
