Multi-Scale Hybrid CNN-Transformer for Smoke Detection in Satellite Images

Tony Zhang, Robert P. Dick; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 2920-2926

Abstract


This paper addresses the challenge of smoke detection from satellite images, which is crucial for identifying and mitigating wildfires. Smoke can vary in size, shape, and texture, making it difficult to classify using remote sensing images. While several CNN architectures have been proposed for smoke detection, they have limitations in modeling long-range context in images due to convolution's bias towards learning local relationships. To address this limitation, the paper describes a hybrid network that combines CNN and transformer architectures. The transformer architecture leverages multi-head attention to learn long-range, global relationships among different image regions. Initially, multi-scale features are extracted by adding the transformer architecture after each CNN layer. Additionally, another transformer layer is appended to capture relationships among features in different receptive fields, significantly improving model accuracy. The proposed approach is evaluated on the USTC_SmokeRS dataset of remote sensing images for smoke detection and outperforms prior methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhang_2025_ICCV, author = {Zhang, Tony and Dick, Robert P.}, title = {Multi-Scale Hybrid CNN-Transformer for Smoke Detection in Satellite Images}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2920-2926} }