Hierarchical Feature Aggregation Network Based on Swin Transformer for Medical Image Segmentation

Hayato Iyoda, Yongqing Sun, Xian-Hua Han; Proceedings of the Asian Conference on Computer Vision (ACCV) Workshops, 2024, pp. 453-465

Abstract


Semantic segmentation plays a crucial role in computer-aided medical image analysis by achieving important and useful regions, which are vital for various diagnostic tasks. Recently, vision transformers (ViTs) have emerged as the leading approach in medical image segmentation, outperforming traditional convolutional neural networks (CNNs). The incorporation strategies of the ViTs for medical segmentation are dominated to leverage the widely used U-shape like architecture (U-Net) while replace the convolution blocks in both encoder and decoder paths using transformer blocks. It remains uncertain which components of the incorporated transformer block contribute most significantly to segmentation results in the medical field. This study presents a hierarchical feature aggregation method based on hierarchical Transformer features to enhance the performance of ViT-based architecture in data-constrained medical image segmentation. Specifically, our approach employs the hierarchical vision Transformer to configure the main encoder path for extracting multi-scale semantic features, and leverages several residual blocks to achieve local representation with detail spatial information. Then, we introduce a hierarchical feature aggregation module (HFAM) to serve as the decoder path for fusing multi-scale semantic features and residual spatial features. Compared with the existing transformer-based U-Net, the explored HFAM can not only effectively combine the diverse contexts but also potentially reduce the computational complexity. Experiments on 3 different medical image segmentation benchmarks have demonstrated our proposed method consistently outperformers the conventional U-Net, and various Transformer-based U-Net.

Related Material


[pdf]
[bibtex]
@InProceedings{Iyoda_2024_ACCV, author = {Iyoda, Hayato and Sun, Yongqing and Han, Xian-Hua}, title = {Hierarchical Feature Aggregation Network Based on Swin Transformer for Medical Image Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV) Workshops}, month = {December}, year = {2024}, pages = {453-465} }