-
[pdf]
[bibtex]@InProceedings{Karimijafarbigloo_2025_WACV, author = {Karimijafarbigloo, Sanaz and Kolahi, Sina Ghorbani and Azad, Reza and Bagci, Ulas and Merhof, Dorit}, title = {Frequency-Domain Refinement of Vision Transformers for Robust Medical Image Segmentation under Degradation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {9158-9167} }
Frequency-Domain Refinement of Vision Transformers for Robust Medical Image Segmentation under Degradation
Abstract
Medical image segmentation is crucial for precise diagnosis treatment planning and disease monitoring in clinical settings. While convolutional neural networks (CNNs) have achieved remarkable success they struggle with modeling long-range dependencies. Vision Transformers (ViTs) address this limitation by leveraging self-attention mechanisms to capture global contextual information. However ViTs often fall short in local feature description which is crucial for precise segmentation. To address this issue we reformulate self-attention in the frequency domain to enhance both local and global feature representation. Our approach the Enhanced Wave Vision Transformer (EW-ViT) incorporates wavelet decomposition within the self-attention block to adaptively refine feature representation in low and high-frequency components. We also introduce the Prompt-Guided High-Frequency Refiner (PGHFR) module to handle image degradation which mainly affects high-frequency components. This module uses implicit prompts to encode degradation-specific information and adjust high-frequency representations accordingly. Additionally we apply a contrastive learning strategy to maintain feature consistency and ensure robustness against noise leading to state-of-the-art (SOTA) performance in medical image segmentation especially under various conditions of degradation. Source code is available at GitHub.
Related Material