3D-CmT: 3D-CNN meets Transformer for Hyperspectral Image Classification

Sunita Arya, Shiv Ram Dubey, S Manthira Moorthi, Debajyoti Dhar, Satish Kumar Singh; Proceedings of the Asian Conference on Computer Vision (ACCV) Workshops, 2024, pp. 679-695

Abstract


In recent years, the combined use of Vision Transformer (ViT) and Convolutional Neural Network (CNN) has shown promising results in tasks related to satellite imagery. In our study, we propose a 3D-CmT (3D-CNN meets Transformer) model for Hyperspectral Image Classification. This model leverages the unique capabilities of both 3D-CNN and ViT to effectively classify images captured by hyperspectral imaging. To learn the local features of the narrow and contiguous electromagnetic spectrum of the hyperspectral images, we utilize a 3DCNN under the spectral feature extraction (SFE) module. Subsequently, a transformer encoder (TE) module is applied on top of the 3D-CNN to incorporate global attention and model long-range dependencies for spatial information in the images. We conducted experiments using commonly used hyperspectral image datasets and performed various ablation studies, such as evaluating the impact of image patch size and different percentages of training samples. The performance of our proposed model is comparable to that of other CNN-based, transformer-based, and hybrid CNN-Transformer-based models in terms of model parameters and accuracy. In addition, we conducted quantitative and qualitative analyses to assess the performance of our model.

Related Material


[pdf]
[bibtex]
@InProceedings{Arya_2024_ACCV, author = {Arya, Sunita and Dubey, Shiv Ram and Moorthi, S Manthira and Dhar, Debajyoti and Singh, Satish Kumar}, title = {3D-CmT: 3D-CNN meets Transformer for Hyperspectral Image Classification}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV) Workshops}, month = {December}, year = {2024}, pages = {679-695} }