A Novel 3D Decoder with Weighted and Learnable Triple Attention for 3D Microscopy Image Segmentation

Shabani, Siyavash; Mohammed, Sahar; Parvin, Bahram

Siyavash Shabani, Sahar Mohammed, Bahram Parvin; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 4708-4717

Abstract

Deep neural networks are the backbone of 3D medical image segmentation architectures, showcasing exceptional application capabilities. However, their increasing model size and computational demands present significant challenges for deployment in real-world medical applications. We introduce the Weighted and Learnable Triple Attention Network (WLTA-Net), a high-performance and efficient model to further advance this area. The WLTA-Net encoder consists of a Swin Transformer, where the multi-scale outputs with different resolutions are fused by the proposed new efficient Triple Attention (WLTA) blocks at four different levels from bottom to top. To demonstrate superior performance, WLTA-Net was first evaluated on public clinical datasets, then 3D Organoid datasets with Dice and PQ scores of 93.47+-0.03 and 92.36+-0.04, respectively. The improved performance also comes with the added benefits of reduced complexity with 35 million parameters and lower computational cost in terms of GFLOPs. The code is available here: https://github.com/Siyavashshabani/WLTA-Net

Related Material

[pdf]

[bibtex]

@InProceedings{Shabani_2025_CVPR, author = {Shabani, Siyavash and Mohammed, Sahar and Parvin, Bahram}, title = {A Novel 3D Decoder with Weighted and Learnable Triple Attention for 3D Microscopy Image Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {4708-4717} }