- [pdf] [supp]
Full-scale Selective Transformer for Semantic Segmentation
In this paper, we rethink the multi-scale feature fusion from two perspectives (scale-level and spatial-level) and propose a full-scale selective fusion strategy for semantic segmentation. Based on such strategy, we design a novel segmentation network, named Full-scale Selective Transformer (FSFormer). Specifically, our FSFormer adaptively selects partial tokens from all tokens at all scales to construct a token subset of interest for each scale. Therefore, each token only interacts with the tokens within its corresponding token subset of interest. The proposed full-scale selective fusion strategy can not only filter out the noisy information propagation but also reduce the computational costs to some extent. We evaluate our FSFormer on four challenging semantic segmentation benchmarks, including PASCAL Context, ADE20K, COCO-Stuff 10K, and Cityscapes, outperforming the state-of-the-art methods.